Makes sense. But why not make it asGRanges, which is shorter? Please go ahead and check in your work so far.
Thanks a lot, Michael On Thu, Aug 5, 2010 at 12:51 AM, Patrick Aboyoun <[email protected]> wrote: > Michael, > Breaking this down to two issues: > > Filtering > Martin has been working on improving filtering in the ShortRead package to > move from a read all then filter data to a block processing based filtering > methodology. Lessons learned there can be brought to rtracklayer for large > bed files and the like. > > import() output class > Keeping the same API and just switching the import methods from producing > RangedData (or UCSCData) output to GRanges output will break backward > compatibility because the RangedData API is not wholly applicable to GRanges > objects. I would not recommend this course since a number of packages in > BioC and scripts in the wild expect the import methods to produce a > RangedData (or UCSCData) object. An additional argument is not that onerous > and can be fazed out over the course of two or three releases (1 - 1.5 > years). Another alternative is to add a new import function (read.GRanges?) > to rtracklayer that shares the same infrastructure as the existing import > methods. > > I have a local copy of rtracklayer where I added a new asRangedData flag to > the GenomicData function and import.gff* methods. I'll sit on this for now > since these changes didn't take a lot of work. This is one of the situations > where the managing the life cycle of the function specs is trickier than > making the desired code changes. > > > Cheers, > Patrick > > > > On 8/4/10 8:24 PM, Michael Lawrence wrote: > > This might work, but it seems like an expensive optimization in that it > changes a lot of the API. If someone cannot make a single copy of the data, > it's unlikely that they're even going to be able to get to GenomicData() or > manipulate it later. Perhaps the coercion function needs some simple tweaks? > The filter support would definitely help. I'd rather keep things simple and > return a single type, and GRanges sounds most appropriate. > > But I'm open to suggestions and further argument. > > Michael > > On Wed, Aug 4, 2010 at 2:05 PM, Patrick Aboyoun <[email protected]>wrote: > >> Michael, >> How integrated would you like to see the GRanges class in rtracklayer? The >> rtracklayer::GenomicData constructor is the master instantiator. I would >> like to add an asRangedData = TRUE (default) argument to the GenomicData >> function and push it all the way up through the import functions where when >> the user sets asRangedData = FALSE, the GenomicData function would create a >> GRanges object. This is what we did with the >> {matchPWM,vmatchPattern,vmatchPDict},BSgenome-methods in the BSgenome >> package and it as good a solution as any. This is a straight-forward change >> and wouldn't take too long to complete. >> >> >> Patrick >> >> >> >> On 8/4/10 12:56 PM, Michael Lawrence wrote: >> >>> GRanges support is definitely on the TODO list. Filters are a good idea >>> and >>> also on the TODO list, possibly with a chunk size parameter to enable >>> chunk >>> processing. >>> >>> I'd love to have the GRanges stuff at least done by the next release. >>> Patches welcome, of course :) >>> >>> Michael >>> >>> On Wed, Aug 4, 2010 at 8:08 AM, Ivan Gregoretti<[email protected]> >>> wrote: >>> >>> >>> >>>> Hello Michael and everyone, >>>> >>>> Would you please consider adding to import() the capacity to generate >>>> a GRanges object rather than the default RangedData object? >>>> >>>> Also, >>>> >>>> Wouldn't it be great to be able to import() with filters just like >>>> with readAligned()? >>>> >>>> >>>> >>>> Justification >>>> >>>> GRanges is a biology-aware container. When importing large BEDs into >>>> R, the current workflow involves creating RangedData first and then >>>> converting to GRanges. >>>> >>>> If the BEDs are really big, holding both objects in memory at any >>>> point in time is a hardware challenge. >>>> >>>> The capacity to filter the input would help in this case and in >>>> general it would provide an increase in efficiency. >>>> >>>> >>>> Thank you, >>>> >>>> Ivan >>>> >>>> >>>> >>>> >>>> Ivan Gregoretti, PhD >>>> National Institute of Diabetes and Digestive and Kidney Diseases >>>> National Institutes of Health >>>> 5 Memorial Dr, Building 5, Room 205. >>>> Bethesda, MD 20892. USA. >>>> Phone: 1-301-496-1016 and 1-301-496-1592 >>>> Fax: 1-301-496-9878 >>>> >>>> _______________________________________________ >>>> Bioc-sig-sequencing mailing list >>>> [email protected] >>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>>> >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> >>> _______________________________________________ >>> Bioc-sig-sequencing mailing list >>> [email protected] >>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>> >>> >> >> > > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
