On 09/03/2010 03:07 PM, Chris Seidel wrote: > Did anything ever get resolved in terms of assigning chromosome lengths > to a GRanges object when it contains alignments that run off the > chromosome ends? The message below was the last of the original thread > that I could find. > > I'm currently having the problem of reading solexa export files into a > GRanges object, and then sometimes having an error while setting the > chromosome lengths if the object has a few reads that are past the > boundary. The only solution I see is to somehow toss out the offending > reads - which means I have to write a complicated function to loop > through all reads and check them against the chromosome length - so I > was just wondering since Ivan brought this problem up back in April, if > a solution was ever reached. (or if anyone knows of an efficient way to > address the problem). > There is also this thread
https://stat.ethz.ch/pipermail/bioconductor/2010-August/034876.html It might be as easy as gr0 = as(aln, "GRanges"); gr = gr0[gr0 %in% seqs], where seqs is a RangesList constructed from the chromosome lengths. A common source for this problem is mapping to mitochondria or other circular genomes (hence finding overhanging alignments on chrM might be enough); this is being actively worked on, but is a deeper issue than it appears at first blush. Martin > -Chris > >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf >> Of Patrick Aboyoun >> Sent: Tuesday, April 27, 2010 12:39 PM >> To: Sean Davis >> Cc: [email protected] >> Subject: Re: [Bioc-sig-seq] GRanges, failure assigning >> chromosome lengths >> >> >> Sean and Ivan, >> Thanks for the insight. I'll look at devising a compromise within the >> existing framework. I need to explore the various methods for GRanges >> object to better understand the impact of a compromise. We >> started with >> the simplest interpretation of limit bounds because it simplifies the >> code. For example, we need to establish the rules for coverage or >> findOverlaps when the DNA is circular or the alignment runs >> off the end >> of a linear chromosome. >> >> >> Patrick >> >> >> On 4/27/10 8:05 AM, Sean Davis wrote: >>> On Tue, Apr 27, 2010 at 10:51 AM, Ivan >> Gregoretti<[email protected]> >>> wrote: >>> >>>> Good morning Sean and everybody, >>>> >>>> >>>>> Actually, the edge case is general as alignments, even on linear >>>>> chromosomes, may extend beyond the end of the chromosome, >> I believe. >>>>> In the best case, these alignments are clipped (in CIGAR >> terms), but >>>>> I don't know that all aligners are doing that appropriately. >>>>> >>>>> Sean >>>>> >>>> So, you rather go for an overriding switch rather than >> infrastructure >>>> overhaul? >>>> >>>> I ask this because GRanges is an exceptionally convenient >> format for >>>> ChIP-seqers and Patrick is trying to make a decision to >> make it work >>>> for real world data. >>>> >>> I guess that I mean to say that the two issues of aligning >> off the end >>> of the chromosome and handling circular genomes are related but >>> separate issues. An override seems quite reasonable for >> dealing with >>> the former. Until aligners or common formats (BAM/SAM) >> deal with the >>> latter, it will be difficult to deal appropriately with circular >>> genomes, so an override is probably a fine compromise. >>> >>> Sean >>> >>> >>> >>>> And yes indeed: aligners do align a little bit past the boundaries >>>> even for linear chromosomes. Thanks for pointing that out! >>>> >>>> Ivan >>>> >>>> >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> >> > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
