Hello Steve, Martin and Everybody, First, thank you both for your help.
As suggested by Steve, I look at https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-August/000509.html by it didn't quite applied to my case. I need to iterate over chromosomes and I need to keep strand information. I'll try this new algorithm Martin just wrote. By the way, Martin, what kind of object should rangesList be? Say that I load my regions like this myRegions <- import('myregions.bed') Can I coerce that IRanges instance myRegions to the rangestList class? How? Thank you! Ivan Ivan Gregoretti, PhD National Institute of Diabetes and Digestive and Kidney Diseases National Institutes of Health 5 Memorial Dr, Building 5, Room 205. Bethesda, MD 20892. USA. Phone: 1-301-496-1592 Fax: 1-301-496-9878 On Tue, Sep 1, 2009 at 7:41 PM, Martin Morgan<[email protected]> wrote: > Hi Ivan -- > > Steve Lianoglou wrote: >> Hi Ivan, >> >> On Aug 31, 2009, at 4:54 PM, Ivan Gregoretti wrote: >> >>> Hello Everybody, >>> >>> How do you subset an AlignedRead instance to keep (or reject) tags >>> that lay within a set of genomic regions? >>> >>> >>> Example >>> >>> Lets say that I have an AlignedRead instance called aln. >>> >>> Now let's say that I have a set of positions in BED style: >>> >>> (chromosome, start end) >>> ch1 1000000 1000050 >>> chrX 20000000 20100000 >>> ...(many more)... >>> >>> We can imagine that I have the BED set loaded as a data frame. >>> >>> Is it possible to pick from aln only the tags within (or outside) the >>> features defined in the table described above? >> >> I think that you should convert your BED file to an IRanges object, and >> use overlap with your ranges + your readAligned object to get what your > > I wrote this, as a trial implementation of %in%. Is that useful? (also > !(... %in% ...) though that would, e.g., include NAs. > > > setMethod("%in%", c("AlignedRead", "RangesList"), > function(x, table) > { > ## consider only sensible alignemnts > chr <- chromosome(x) > pos <- position(x) > wd <- width(x) > notNA <- !(is.na(chr) | is.na(pos) | is.na(wd)) > ## find overlap > chr <- chr[notNA] > rl <- RangesList(mapply(IRanges, start=split(pos[notNA], chr), > width=split(wd[notNA], chr))) > olap <- rl %in% table > ## map to original indicies > len <- seq_len(length(x)) > idx <- unlist(split(len[notNA], chr), use.names=FALSE) > len %in% idx[unlist(olap)] > }) > > One would then > > aln[aln %in% rangesList] > > Martin > >> after. See Martin's post about something like this in this thread: >> >> https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-August/000509.html >> >> To get the reads *outside* of your ranges, maybe you can call the >> ``gaps`` on your bed/ranges and then do the same thing ... or perhaps >> ``setdiff(ranges, aln)`` might work, too? (where aln is your IRanges >> converted alignedRead object (if necessary)). >> >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
