Re: [Bioc-sig-seq] AlignedRead and complex subsetting

Patrick Aboyoun Wed, 02 Sep 2009 10:02:56 -0700

Ivan,

If you are using rtracklayer::import, you will produce a RangedDataobject and you can extract the RangesList component using the ranges()function.


library(rtracklayer)
myRegions <- import('myregions.bed')
myRangesList <- ranges(myRegions)



Patrick


Ivan Gregoretti wrote:

Hello Steve, Martin and Everybody,

First, thank you both for your help.

As suggested by Steve, I look at

https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-August/000509.html

by it didn't quite applied to my case. I need to iterate over
chromosomes and I need to keep strand information.

I'll try this new algorithm Martin just wrote. By the way, Martin,
what kind of object should rangesList be?

Say that I load my regions like this

myRegions <- import('myregions.bed')

Can I coerce that IRanges instance myRegions to the rangestList class? How?

Thank you!

Ivan


Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878



On Tue, Sep 1, 2009 at 7:41 PM, Martin Morgan<[email protected]> wrote:

Hi Ivan --

Steve Lianoglou wrote:

Hi Ivan,

On Aug 31, 2009, at 4:54 PM, Ivan Gregoretti wrote:

Hello Everybody,

How do you subset an AlignedRead instance to keep (or reject) tags
that lay within a set of genomic regions?


Example

Lets say that I have an AlignedRead instance called aln.

Now let's say that I have a set of positions in BED style:

(chromosome, start end)
ch1 1000000 1000050
chrX 20000000 20100000
...(many more)...

We can imagine that I have the BED set loaded as a data frame.

Is it possible to pick from aln only the tags within (or outside) the
features defined in the table described above?

I think that you should convert your BED file to an IRanges object, and
use overlap with your ranges + your readAligned object to get what your

I wrote this, as a trial implementation of %in%. Is that useful? (also
!(... %in% ...) though that would, e.g., include NAs.


setMethod("%in%", c("AlignedRead", "RangesList"),
         function(x, table)
{
   ## consider only sensible alignemnts
   chr <- chromosome(x)
   pos <- position(x)
   wd <- width(x)
   notNA <- !(is.na(chr) | is.na(pos) | is.na(wd))
   ## find overlap
   chr <- chr[notNA]
   rl <- RangesList(mapply(IRanges, start=split(pos[notNA], chr),
                           width=split(wd[notNA], chr)))
   olap <- rl %in% table
   ## map to original indicies
   len <- seq_len(length(x))
   idx <- unlist(split(len[notNA], chr), use.names=FALSE)
   len %in% idx[unlist(olap)]
})

One would then

 aln[aln %in% rangesList]

Martin

after. See Martin's post about something like this in this thread:

https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-August/000509.html

To get the reads *outside* of your ranges, maybe you can call the
``gaps`` on your bed/ranges and then do the same thing ...  or perhaps
``setdiff(ranges, aln)`` might work, too? (where aln is your IRanges
converted alignedRead object (if necessary)).

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] AlignedRead and complex subsetting

Reply via email to