Re: [Bioc-sig-seq] rtracklayer and import()ing into GRanges

Patrick Aboyoun Thu, 05 Aug 2010 10:46:04 -0700

Michael,
I just made a minor check-in to rtracklayer where I replaced use of 
Biobase:listLen with IRanges::elementLenghts in an effort to minimize 
the impact of Biobase on the sequence package stack.


Before I start the boulder rolling, how should I reconcile the UCSCData 
class with the GRanges class? Once I have that sorted I can make changes 
to import.bed and import.wig as well.

I originally named the argument asRangedData in the BSgenome methods to 
reinforce that RangedData output is not intended to be the default and 
conceptually the user is making an extra effort to produce a RangedData 
object.


Patrick


On 8/5/10 4:32 AM, Michael Lawrence wrote:
> Makes sense. But why not make it asGRanges, which is shorter? Please 
> go ahead and check in your work so far.
>
> Thanks a lot,
> Michael
>
> On Thu, Aug 5, 2010 at 12:51 AM, Patrick Aboyoun <[email protected] 
> <mailto:[email protected]>> wrote:
>
>     Michael,
>     Breaking this down to two issues:
>
>     Filtering
>     Martin has been working on improving filtering in the ShortRead
>     package to move from a read all then filter data to a block
>     processing based filtering methodology. Lessons learned there can
>     be brought to rtracklayer for large bed files and the like.
>
>     import() output class
>     Keeping the same API and just switching the import methods from
>     producing RangedData (or UCSCData) output to GRanges output will
>     break backward compatibility because the RangedData API is not
>     wholly applicable to GRanges objects. I would not recommend this
>     course since a number of packages in BioC and scripts in the wild
>     expect the import methods to produce a RangedData (or UCSCData)
>     object. An additional argument is not that onerous and can be
>     fazed out over the course of two or three releases (1 - 1.5
>     years). Another alternative is to add a new import function
>     (read.GRanges?) to rtracklayer that shares the same infrastructure
>     as the existing import methods.
>
>     I have a local copy of rtracklayer where I added a new
>     asRangedData flag to the GenomicData function and import.gff*
>     methods. I'll sit on this for now since these changes didn't take
>     a lot of work. This is one of the situations where the managing
>     the life cycle of the function specs is trickier than making the
>     desired code changes.
>
>
>     Cheers,
>     Patrick
>
>
>
>     On 8/4/10 8:24 PM, Michael Lawrence wrote:
>>     This might work, but it seems like an expensive optimization in
>>     that it changes a lot of the API. If someone cannot make a single
>>     copy of the data, it's unlikely that they're even going to be
>>     able to get to GenomicData() or manipulate it later. Perhaps the
>>     coercion function needs some simple tweaks? The filter support
>>     would definitely help. I'd rather keep things simple and return a
>>     single type, and GRanges sounds most appropriate.
>>
>>     But I'm open to suggestions and further argument.
>>
>>     Michael
>>
>>     On Wed, Aug 4, 2010 at 2:05 PM, Patrick Aboyoun
>>     <[email protected] <mailto:[email protected]>> wrote:
>>
>>         Michael,
>>         How integrated would you like to see the GRanges class in
>>         rtracklayer? The rtracklayer::GenomicData constructor is the
>>         master instantiator. I would like to add an asRangedData =
>>         TRUE (default) argument to the GenomicData function and push
>>         it all the way up through the import functions where when the
>>         user sets asRangedData = FALSE, the GenomicData function
>>         would create a GRanges object. This is what we did with the
>>         {matchPWM,vmatchPattern,vmatchPDict},BSgenome-methods in the
>>         BSgenome package and it as good a solution as any. This is a
>>         straight-forward change and wouldn't take too long to complete.
>>
>>
>>         Patrick
>>
>>
>>
>>         On 8/4/10 12:56 PM, Michael Lawrence wrote:
>>
>>             GRanges support is definitely on the TODO list. Filters
>>             are a good idea and
>>             also on the TODO list, possibly with a chunk size
>>             parameter to enable chunk
>>             processing.
>>
>>             I'd love to have the GRanges stuff at least done by the
>>             next release.
>>             Patches welcome, of course :)
>>
>>             Michael
>>
>>             On Wed, Aug 4, 2010 at 8:08 AM, Ivan
>>             Gregoretti<[email protected]
>>             <mailto:[email protected]>>  wrote:
>>
>>
>>                 Hello Michael and everyone,
>>
>>                 Would you please consider adding to import() the
>>                 capacity to generate
>>                 a GRanges object rather than the default RangedData
>>                 object?
>>
>>                 Also,
>>
>>                 Wouldn't it be great to be able to import() with
>>                 filters just like
>>                 with readAligned()?
>>
>>
>>
>>                 Justification
>>
>>                 GRanges is a biology-aware container. When importing
>>                 large BEDs into
>>                 R, the current workflow involves creating RangedData
>>                 first and then
>>                 converting to GRanges.
>>
>>                 If the BEDs are really big, holding both objects in
>>                 memory at any
>>                 point in time is a hardware challenge.
>>
>>                 The capacity to filter the input would help in this
>>                 case and in
>>                 general it would provide an increase in efficiency.
>>
>>
>>                 Thank you,
>>
>>                 Ivan
>>
>>
>>
>>
>>                 Ivan Gregoretti, PhD
>>                 National Institute of Diabetes and Digestive and
>>                 Kidney Diseases
>>                 National Institutes of Health
>>                 5 Memorial Dr, Building 5, Room 205.
>>                 Bethesda, MD 20892. USA.
>>                 Phone: 1-301-496-1016 and 1-301-496-1592
>>                 Fax: 1-301-496-9878
>>
>>                 _______________________________________________
>>                 Bioc-sig-sequencing mailing list
>>                 [email protected]
>>                 <mailto:[email protected]>
>>                 https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>>                    [[alternative HTML version deleted]]
>>
>>
>>             _______________________________________________
>>             Bioc-sig-sequencing mailing list
>>             [email protected]
>>             <mailto:[email protected]>
>>             https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>>
>
>


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] rtracklayer and import()ing into GRanges

Reply via email to