Re: [Bioc-sig-seq] rtracklayer and import()ing into GRanges

Michael Lawrence Thu, 05 Aug 2010 13:11:57 -0700

On Thu, Aug 5, 2010 at 10:45 AM, Patrick Aboyoun <[email protected]> wrote:


>  Michael,
> I just made a minor check-in to rtracklayer where I replaced use of
> Biobase:listLen with IRanges::elementLenghts in an effort to minimize the
> impact of Biobase on the sequence package stack.
>
>
Ok. It looks like elementLengths has been optimized since the last time I
looked.



> Before I start the boulder rolling, how should I reconcile the UCSCData
> class with the GRanges class? Once I have that sorted I can make changes to
> import.bed and import.wig as well.
>
>
Well, eventually we'll want to stick the track line information on to
GRanges. Could be done via a subclass like with UCSCData. metadata() is
another option. I do actually use the subclass for dispatch purposes, pretty
printing, etc. For right now though, the extra information could just be
dropped if the user requests a GRanges.


> I originally named the argument asRangedData in the BSgenome methods to
> reinforce that RangedData output is not intended to be the default and
> conceptually the user is making an extra effort to produce a RangedData
> object.
>
>
> Patrick
>
>
>
> On 8/5/10 4:32 AM, Michael Lawrence wrote:
>
> Makes sense. But why not make it asGRanges, which is shorter? Please go
> ahead and check in your work so far.
>
> Thanks a lot,
> Michael
>
> On Thu, Aug 5, 2010 at 12:51 AM, Patrick Aboyoun <[email protected]>wrote:
>
>>  Michael,
>> Breaking this down to two issues:
>>
>> Filtering
>> Martin has been working on improving filtering in the ShortRead package to
>> move from a read all then filter data to a block processing based filtering
>> methodology. Lessons learned there can be brought to rtracklayer for large
>> bed files and the like.
>>
>> import() output class
>> Keeping the same API and just switching the import methods from producing
>> RangedData (or UCSCData) output to GRanges output will break backward
>> compatibility because the RangedData API is not wholly applicable to GRanges
>> objects. I would not recommend this course since a number of packages in
>> BioC and scripts in the wild expect the import methods to produce a
>> RangedData (or UCSCData) object. An additional argument is not that onerous
>> and can be fazed out over the course of two or three releases (1 - 1.5
>> years). Another alternative is to add a new import function (read.GRanges?)
>> to rtracklayer that shares the same infrastructure as the existing import
>> methods.
>>
>> I have a local copy of rtracklayer where I added a new asRangedData flag
>> to the GenomicData function and import.gff* methods. I'll sit on this for
>> now since these changes didn't take a lot of work. This is one of the
>> situations where the managing the life cycle of the function specs is
>> trickier than making the desired code changes.
>>
>>
>> Cheers,
>> Patrick
>>
>>
>>
>> On 8/4/10 8:24 PM, Michael Lawrence wrote:
>>
>> This might work, but it seems like an expensive optimization in that it
>> changes a lot of the API. If someone cannot make a single copy of the data,
>> it's unlikely that they're even going to be able to get to GenomicData() or
>> manipulate it later. Perhaps the coercion function needs some simple tweaks?
>> The filter support would definitely help. I'd rather keep things simple and
>> return a single type, and GRanges sounds most appropriate.
>>
>> But I'm open to suggestions and further argument.
>>
>> Michael
>>
>> On Wed, Aug 4, 2010 at 2:05 PM, Patrick Aboyoun <[email protected]>wrote:
>>
>>> Michael,
>>> How integrated would you like to see the GRanges class in rtracklayer?
>>> The rtracklayer::GenomicData constructor is the master instantiator. I would
>>> like to add an asRangedData = TRUE (default) argument to the GenomicData
>>> function and push it all the way up through the import functions where when
>>> the user sets asRangedData = FALSE, the GenomicData function would create a
>>> GRanges object. This is what we did with the
>>> {matchPWM,vmatchPattern,vmatchPDict},BSgenome-methods in the BSgenome
>>> package and it as good a solution as any. This is a straight-forward change
>>> and wouldn't take too long to complete.
>>>
>>>
>>> Patrick
>>>
>>>
>>>
>>> On 8/4/10 12:56 PM, Michael Lawrence wrote:
>>>
>>>>  GRanges support is definitely on the TODO list. Filters are a good
>>>> idea and
>>>> also on the TODO list, possibly with a chunk size parameter to enable
>>>> chunk
>>>> processing.
>>>>
>>>> I'd love to have the GRanges stuff at least done by the next release.
>>>> Patches welcome, of course :)
>>>>
>>>> Michael
>>>>
>>>> On Wed, Aug 4, 2010 at 8:08 AM, Ivan Gregoretti<[email protected]>
>>>>  wrote:
>>>>
>>>>
>>>>
>>>>> Hello Michael and everyone,
>>>>>
>>>>> Would you please consider adding to import() the capacity to generate
>>>>> a GRanges object rather than the default RangedData object?
>>>>>
>>>>> Also,
>>>>>
>>>>> Wouldn't it be great to be able to import() with filters just like
>>>>> with readAligned()?
>>>>>
>>>>>
>>>>>
>>>>> Justification
>>>>>
>>>>> GRanges is a biology-aware container. When importing large BEDs into
>>>>> R, the current workflow involves creating RangedData first and then
>>>>> converting to GRanges.
>>>>>
>>>>> If the BEDs are really big, holding both objects in memory at any
>>>>> point in time is a hardware challenge.
>>>>>
>>>>> The capacity to filter the input would help in this case and in
>>>>> general it would provide an increase in efficiency.
>>>>>
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Ivan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Ivan Gregoretti, PhD
>>>>> National Institute of Diabetes and Digestive and Kidney Diseases
>>>>> National Institutes of Health
>>>>> 5 Memorial Dr, Building 5, Room 205.
>>>>> Bethesda, MD 20892. USA.
>>>>> Phone: 1-301-496-1016 and 1-301-496-1592
>>>>> Fax: 1-301-496-9878
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-sig-sequencing mailing list
>>>>> [email protected]
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>>
>>>>>
>>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioc-sig-sequencing mailing list
>>>> [email protected]
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>
>>>>
>>>
>>>
>>
>>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] rtracklayer and import()ing into GRanges

Reply via email to