Hi Ivan, On Thu, Jan 28, 2010 at 10:46 AM, Ivan Gregoretti <[email protected]> wrote: > Hi everybody, > > I have two sets of ChIP-seq peaks: A and B. > How do I select the peaks in A that are not overlapped by any peak in B? > > > Now in more detail: > > I do > > library(rtracklayer) > A <- import('condition_A.bed', 'bed') > B <- import('condition_B.bed', 'bed') > > they are both RangedData instances. > > I need to compute A - B. > > How do you guys do it?
You could try to pull out the range data from `import`-ed bed files and look at the documentation for ?setdiff and ?findOverlaps in the IRanges package to get at the answer. This is semi-untested, but it might look something like this: A.range <- range(A) B.range <- range(B) o <- findOverlaps(A.range, B.range) Assuming your "space" (chromosome) names are like 'chr1', 'chr2', etc, I think something like: A.range$chr1[-unique(queryHits(o$chr1))] will tell you the intervals in A that don't overlap with intervals in B. Maybe not the best way, but that could do it. Using IRanges::setdiff might work too, but will return you unique ranges in one object that aren't in the other. The subtlety here is that if there is partial overlap between two ranges, it will just remove the partial overlap of a range, and not the entirety of the range that shares the overlap (ugh .. does that make sense?) Anyway, just look at the ?findOverlaps and ?setdiff documentation in the IRanges package, and you can likely find an even more intuitive way to do it. I expect the @fhcrc folk will probably also chime in with a better way soon, too .. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
