Re: [Bioc-sig-seq] large BAM files and large BED files

Michael Lawrence Mon, 19 Sep 2011 11:58:11 -0700

On Mon, Sep 19, 2011 at 11:31 AM, Martin Morgan <mtmor...@fhcrc.org> wrote:


> On 09/19/2011 11:26 AM, Rene Paradis wrote:
>
>> Thanks Martin and Michael for your constructive advices,
>>
>> I used the ScanBamParam object to successfully load a part of the Chr1
>> from a Bam file via ScanBam. Honestly I do not know what are the
>> differences between readGappedAlignments, readBamGappedAlignment and
>> ScanBam. The last two of them can take a  ScanBamParam object.
>>
>
> scanBam returns a list-of-lists, it's the most flexible but least
> 'user-friendly'.
>
> readGappedAlignments is meant to be a 'front end' to read GappedAlignments
> from several different sources, and readBamGappedAlignments is meant to be
> one of those sources; usually the 'user' would readGappedAlignments.
>
>
>  But I wished I could select the seqname in GRanges to retrieve all the
>> chr1 (as an example) data from the Bam file. It seems I must select a
>> range. So I put a value that goes beyond the range of the chr1 because I
>> do not know that range, and I got an<<INTEGER () can only be applied to
>> a 'integer', not a special>>.
>
>
Couldn't Rsamtools give something more informative?


> There must be something I missed that
>> could help me doing that.
>>
>
> see ?scanBamHeader, e.g.,
>
> >  fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
> > scanBamHeader(fl)[[1]]$targets
> seq1 seq2
> 1575 1584
>
> Would be nice to have a method for getting a Seqinfo out of a BAM header.
Then one can just coerce that to a GRanges. rtracklayer does the equivalent
for BigWig.

Michael



> Martin
>
>
>
>> ultimately, I want to launch a PICS analysis that requires a
>> segReadsList object.
>>
>> Overall I definitely progressed by your help, thank you.
>>
>> Rene
>>
>>
>>
>>
>> On Fri, 2011-09-16 at 14:29 -0700, Martin Morgan wrote:
>>
>>> On 09/16/2011 02:11 PM, Michael Lawrence wrote:
>>>
>>>> It sounds like you're trying to use BED as an alternative to BAM?
>>>> Probably
>>>> not a good idea, especially at this scale. Why are you aiming for a
>>>> GenomeData? A GappedAlignments might be more appropriate. See
>>>> GenomicRanges::**readGappedAlignments() for bringing a BAM into a
>>>> GappedAlignments.
>>>>
>>>
>>> Hi Rene
>>>
>>> the 'which' argument to readGappedAlignments (it'll become 'param' with
>>> the next release, and be a ScanBamParam object) allows you to select
>>> regions to process, e.g., chromosome-at-a-time, to help with file size.
>>>
>>> Martin
>>>
>>>>
>>>> This page might help:
>>>> http://bioconductor.org/help/**workflows/high-throughput-**
>>>> sequencing/#sequencing-**resources<http://bioconductor.org/help/workflows/high-throughput-sequencing/#sequencing-resources>
>>>>
>>>> But it could really be improved.
>>>>
>>>> Michael
>>>>
>>>> On Fri, Sep 16, 2011 at 1:44 PM, Rene Paradis<rene.paradis@genome.**
>>>> ulaval.ca <rene.para...@genome.ulaval.ca>
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>>  Hello,
>>>>>
>>>>> I am experiencing a problem regarding the load in memory of bed files
>>>>> of
>>>>> 30 GB. my function read.table unleash the error : Error in unique(x) :
>>>>> length xxxxxx is too large for hashing.
>>>>>
>>>>> this is generated by the function MKsetup of the unique.c file. Even by
>>>>> increasing by 10 000x the value, the error persists. I believe the
>>>>> function pushes more data in ram, but I am not sure this is the good
>>>>> way
>>>>> to focus on.
>>>>>
>>>>> Ultimately, I would like to produce a GenomeData object from either a
>>>>> BAM file or a bed file.
>>>>>
>>>>> has someone ever worked with very very big BAM files (about 30 GB)
>>>>>
>>>>> thanks
>>>>>
>>>>> Rene paradis
>>>>>
>>>>> ______________________________**_________________
>>>>> Bioc-sig-sequencing mailing list
>>>>> Bioc-sig-sequencing@r-project.**org<Bioc-sig-sequencing@r-project.org>
>>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-sig-sequencing<https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing>
>>>>>
>>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________**_________________
>>>> Bioc-sig-sequencing mailing list
>>>> Bioc-sig-sequencing@r-project.**org <Bioc-sig-sequencing@r-project.org>
>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-sig-sequencing<https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing>
>>>>
>>>
>>>
>>>
>>
>>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
> ______________________________**_________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing@r-project.**org <Bioc-sig-sequencing@r-project.org>
> https://stat.ethz.ch/mailman/**listinfo/bioc-sig-sequencing<https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] large BAM files and large BED files

Reply via email to