On Thu, Aug 5, 2010 at 10:45 AM, Patrick Aboyoun <[email protected]> wrote:
> Michael, > I just made a minor check-in to rtracklayer where I replaced use of > Biobase:listLen with IRanges::elementLenghts in an effort to minimize the > impact of Biobase on the sequence package stack. > > Ok. It looks like elementLengths has been optimized since the last time I looked. > Before I start the boulder rolling, how should I reconcile the UCSCData > class with the GRanges class? Once I have that sorted I can make changes to > import.bed and import.wig as well. > > Well, eventually we'll want to stick the track line information on to GRanges. Could be done via a subclass like with UCSCData. metadata() is another option. I do actually use the subclass for dispatch purposes, pretty printing, etc. For right now though, the extra information could just be dropped if the user requests a GRanges. > I originally named the argument asRangedData in the BSgenome methods to > reinforce that RangedData output is not intended to be the default and > conceptually the user is making an extra effort to produce a RangedData > object. > > > Patrick > > > > On 8/5/10 4:32 AM, Michael Lawrence wrote: > > Makes sense. But why not make it asGRanges, which is shorter? Please go > ahead and check in your work so far. > > Thanks a lot, > Michael > > On Thu, Aug 5, 2010 at 12:51 AM, Patrick Aboyoun <[email protected]>wrote: > >> Michael, >> Breaking this down to two issues: >> >> Filtering >> Martin has been working on improving filtering in the ShortRead package to >> move from a read all then filter data to a block processing based filtering >> methodology. Lessons learned there can be brought to rtracklayer for large >> bed files and the like. >> >> import() output class >> Keeping the same API and just switching the import methods from producing >> RangedData (or UCSCData) output to GRanges output will break backward >> compatibility because the RangedData API is not wholly applicable to GRanges >> objects. I would not recommend this course since a number of packages in >> BioC and scripts in the wild expect the import methods to produce a >> RangedData (or UCSCData) object. An additional argument is not that onerous >> and can be fazed out over the course of two or three releases (1 - 1.5 >> years). Another alternative is to add a new import function (read.GRanges?) >> to rtracklayer that shares the same infrastructure as the existing import >> methods. >> >> I have a local copy of rtracklayer where I added a new asRangedData flag >> to the GenomicData function and import.gff* methods. I'll sit on this for >> now since these changes didn't take a lot of work. This is one of the >> situations where the managing the life cycle of the function specs is >> trickier than making the desired code changes. >> >> >> Cheers, >> Patrick >> >> >> >> On 8/4/10 8:24 PM, Michael Lawrence wrote: >> >> This might work, but it seems like an expensive optimization in that it >> changes a lot of the API. If someone cannot make a single copy of the data, >> it's unlikely that they're even going to be able to get to GenomicData() or >> manipulate it later. Perhaps the coercion function needs some simple tweaks? >> The filter support would definitely help. I'd rather keep things simple and >> return a single type, and GRanges sounds most appropriate. >> >> But I'm open to suggestions and further argument. >> >> Michael >> >> On Wed, Aug 4, 2010 at 2:05 PM, Patrick Aboyoun <[email protected]>wrote: >> >>> Michael, >>> How integrated would you like to see the GRanges class in rtracklayer? >>> The rtracklayer::GenomicData constructor is the master instantiator. I would >>> like to add an asRangedData = TRUE (default) argument to the GenomicData >>> function and push it all the way up through the import functions where when >>> the user sets asRangedData = FALSE, the GenomicData function would create a >>> GRanges object. This is what we did with the >>> {matchPWM,vmatchPattern,vmatchPDict},BSgenome-methods in the BSgenome >>> package and it as good a solution as any. This is a straight-forward change >>> and wouldn't take too long to complete. >>> >>> >>> Patrick >>> >>> >>> >>> On 8/4/10 12:56 PM, Michael Lawrence wrote: >>> >>>> GRanges support is definitely on the TODO list. Filters are a good >>>> idea and >>>> also on the TODO list, possibly with a chunk size parameter to enable >>>> chunk >>>> processing. >>>> >>>> I'd love to have the GRanges stuff at least done by the next release. >>>> Patches welcome, of course :) >>>> >>>> Michael >>>> >>>> On Wed, Aug 4, 2010 at 8:08 AM, Ivan Gregoretti<[email protected]> >>>> wrote: >>>> >>>> >>>> >>>>> Hello Michael and everyone, >>>>> >>>>> Would you please consider adding to import() the capacity to generate >>>>> a GRanges object rather than the default RangedData object? >>>>> >>>>> Also, >>>>> >>>>> Wouldn't it be great to be able to import() with filters just like >>>>> with readAligned()? >>>>> >>>>> >>>>> >>>>> Justification >>>>> >>>>> GRanges is a biology-aware container. When importing large BEDs into >>>>> R, the current workflow involves creating RangedData first and then >>>>> converting to GRanges. >>>>> >>>>> If the BEDs are really big, holding both objects in memory at any >>>>> point in time is a hardware challenge. >>>>> >>>>> The capacity to filter the input would help in this case and in >>>>> general it would provide an increase in efficiency. >>>>> >>>>> >>>>> Thank you, >>>>> >>>>> Ivan >>>>> >>>>> >>>>> >>>>> >>>>> Ivan Gregoretti, PhD >>>>> National Institute of Diabetes and Digestive and Kidney Diseases >>>>> National Institutes of Health >>>>> 5 Memorial Dr, Building 5, Room 205. >>>>> Bethesda, MD 20892. USA. >>>>> Phone: 1-301-496-1016 and 1-301-496-1592 >>>>> Fax: 1-301-496-9878 >>>>> >>>>> _______________________________________________ >>>>> Bioc-sig-sequencing mailing list >>>>> [email protected] >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>>>> >>>>> >>>>> >>>> [[alternative HTML version deleted]] >>>> >>>> >>>> _______________________________________________ >>>> Bioc-sig-sequencing mailing list >>>> [email protected] >>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>>> >>>> >>> >>> >> >> > > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
