On Mon, 2011-09-19 at 12:10 -0700, Martin Morgan wrote: > On 09/19/2011 11:55 AM, Michael Lawrence wrote: > > > > > > On Mon, Sep 19, 2011 at 11:31 AM, Martin Morgan <mtmor...@fhcrc.org > > <mailto:mtmor...@fhcrc.org>> wrote: > > > > On 09/19/2011 11:26 AM, Rene Paradis wrote: > > > > Thanks Martin and Michael for your constructive advices, > > > > I used the ScanBamParam object to successfully load a part of > > the Chr1 > > from a Bam file via ScanBam. Honestly I do not know what are the > > differences between readGappedAlignments, readBamGappedAlignment and > > ScanBam. The last two of them can take a ScanBamParam object. > > > > > > scanBam returns a list-of-lists, it's the most flexible but least > > 'user-friendly'. > > > > readGappedAlignments is meant to be a 'front end' to read > > GappedAlignments from several different sources, and > > readBamGappedAlignments is meant to be one of those sources; usually > > the 'user' would readGappedAlignments. > > > > > > But I wished I could select the seqname in GRanges to retrieve > > all the > > chr1 (as an example) data from the Bam file. It seems I must > > select a > > range. So I put a value that goes beyond the range of the chr1 > > because I > > do not know that range, and I got an<<INTEGER () can only be > > applied to > > a 'integer', not a special>>. > > > > > > Couldn't Rsamtools give something more informative? > > The info in the original post isn't enough to understand how the error > occurs. > To reproduce the original error: choose a 30GB Bam file. and run: dataIP <- read.table(fileName,header = TRUE, colClasses - c("factor","integer","factor"))
after a while, it will trow the error. the seqinfo above did not work for me, probably I do not have the latest devel version of Seqinfo. > fl [1] "/opt/R/lib/Rsamtools/extdata/ex1.bam" > seqinfo(open(BamFile(fl))) Error in function (classes, fdef, mtable) : unable to find an inherited method for function "seqinfo", for signature "BamFile" > otherwise, we have splitted the Big bam into chromosomes with samtools. We were then able to load these smaller files. > > > > There must be something I missed that > > could help me doing that. > > > > > > see ?scanBamHeader, e.g., > > > > > fl <- system.file("extdata", "ex1.bam", package="Rsamtools") > > > scanBamHeader(fl)[[1]]$targets > > seq1 seq2 > > 1575 1584 > > > > Would be nice to have a method for getting a Seqinfo out of a BAM > > header. Then one can just coerce that to a GRanges. rtracklayer does the > > equivalent for BigWig. > > In devel, > > > seqinfo(open(BamFile(fl))) > Seqinfo of length 2 > seqnames seqlengths isCircular > seq1 1575 NA > seq2 1584 NA > > I think this needs to be updated to deal with recent changes to Seqinfo > to store the reference genome (which is sometimes also present in the > BAM file). > I think this update would be useful for us. Thank you for your support Martin and Michael. Rene > Maritn > > > > Michael > > > > Martin > > > > > > > > ultimately, I want to launch a PICS analysis that requires a > > segReadsList object. > > > > Overall I definitely progressed by your help, thank you. > > > > Rene > > > > > > > > > > On Fri, 2011-09-16 at 14:29 -0700, Martin Morgan wrote: > > > > On 09/16/2011 02:11 PM, Michael Lawrence wrote: > > > > It sounds like you're trying to use BED as an > > alternative to BAM? Probably > > not a good idea, especially at this scale. Why are you > > aiming for a > > GenomeData? A GappedAlignments might be more > > appropriate. See > > GenomicRanges::__readGappedAlignments() for bringing a > > BAM into a > > GappedAlignments. > > > > > > Hi Rene > > > > the 'which' argument to readGappedAlignments (it'll become > > 'param' with > > the next release, and be a ScanBamParam object) allows you > > to select > > regions to process, e.g., chromosome-at-a-time, to help with > > file size. > > > > Martin > > > > > > This page might help: > > > > http://bioconductor.org/help/__workflows/high-throughput-__sequencing/#sequencing-__resources > > > > <http://bioconductor.org/help/workflows/high-throughput-sequencing/#sequencing-resources> > > > > But it could really be improved. > > > > Michael > > > > On Fri, Sep 16, 2011 at 1:44 PM, Rene > > Paradis<rene.paradis@genome.__ulaval.ca > > <mailto:rene.para...@genome.ulaval.ca> > > > > wrote: > > > > > > Hello, > > > > I am experiencing a problem regarding the load in > > memory of bed files of > > 30 GB. my function read.table unleash the error : > > Error in unique(x) : > > length xxxxxx is too large for hashing. > > > > this is generated by the function MKsetup of the > > unique.c file. Even by > > increasing by 10 000x the value, the error persists. > > I believe the > > function pushes more data in ram, but I am not sure > > this is the good way > > to focus on. > > > > Ultimately, I would like to produce a GenomeData > > object from either a > > BAM file or a bed file. > > > > has someone ever worked with very very big BAM files > > (about 30 GB) > > > > thanks > > > > Rene paradis > > > > _________________________________________________ > > Bioc-sig-sequencing mailing list > > Bioc-sig-sequencing@r-project.__org > > <mailto:Bioc-sig-sequencing@r-project.org> > > > > https://stat.ethz.ch/mailman/__listinfo/bioc-sig-sequencing > > > > <https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing> > > > > > > [[alternative HTML version deleted]] > > > > _________________________________________________ > > Bioc-sig-sequencing mailing list > > Bioc-sig-sequencing@r-project.__org > > <mailto:Bioc-sig-sequencing@r-project.org> > > https://stat.ethz.ch/mailman/__listinfo/bioc-sig-sequencing > > <https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing> > > > > > > > > > > > > > > > > -- > > Computational Biology > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > > > Location: M1-B861 > > Telephone: 206 667-2793 <tel:206%20667-2793> > > > > _________________________________________________ > > Bioc-sig-sequencing mailing list > > Bioc-sig-sequencing@r-project.__org > > <mailto:Bioc-sig-sequencing@r-project.org> > > https://stat.ethz.ch/mailman/__listinfo/bioc-sig-sequencing > > <https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing> > > > > > > _______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing