On 12/13/2010 02:00 PM, Dario Strbenac wrote: > Hi, > > Yes, that works fine, thanks. It must've been a size issue I was having.
Rsamtools 1.2.2 in release has been updated to say too many records, use 'param=ScanBamParam(which=<...>)' when the number of reads / nucleotides results in more than 2^31-1 nucleotides; The devel version of Rsamtools also currently does this, but the intention is to arrive at a more robust solution. I think this addresses the problem, but would be happy to know if the original example still fails. Martin > > ---- Original message ---- >> Date: Mon, 13 Dec 2010 17:31:24 +1000 >> From: Paul Leo <[email protected]> >> Subject: Re: [Bioc-sig-seq] scanBam Error >> To: [email protected] >> Cc: [email protected] >> >> Do you need all the sequence data at once? >> >> Instead of using a smaller bam file can you read in >> a smaller portion of your large bamfile ? >> >> data.gr<-GRanges(seqnames >> =paste("chr",13,sep=""),ranges = >> IRanges(start=as.numeric(28608234),end=as.numeric(28608363)),strand="+") >> >> which<- data.gr >> >> params<-ScanBamParam(which=which,flag=scanBamFlag(isUnmappedQuery=FALSE,isDuplicate=NA,isValidVendorRead=TRUE),simpleCigar >> = FALSE,reverseComplement = >> >> FALSE,what=c("qname","flag","rname","seq","strand","pos","mpos","qwidth","cigar","qual","mapq","isize", >> "mrnm" ),tag="RG" ) # change to what you want >> aln1 <- scanBam("HS1808.bam",param=params) >> >> aln1[[1]] >> >> That should work fine? >> >> -- >> Dr Paul Leo >> Bioinformatician >> UQ Diamantina Institute for Cancer, Immunology and Metabolic Medicine >> --------------------------------------------------------------------- >> Level 4, R Wing >> Princess Alexandra Hospital >> Ipswich Rd >> Woolloongabba QLD 4102 >> Tel: +61 7 3240 7740 Mob: 041 303 8691 Fax: +61 7 3240 5946 >> Email: [email protected] Web: http://www.di.uq.edu.au >> >> -----Original Message----- >> From: Dario Strbenac <[email protected]> >> Reply-to: [email protected] >> To: [email protected] >> Subject: Re: [Bioc-sig-seq] scanBam Error >> Date: Mon, 13 Dec 2010 17:15:38 +1100 >> >> I tried it out by making a smaller bam file with only reads from one >> chromosome, and it worked fine. The full bam file is 4 GB and has 75 million >> reads in it. Could the size be a problem ? Could you test out a bam file of >> this size on your end, without me sending you one that big ? Also, the error >> is different after I put the scamBamParam in the right spot : >> >> Error in .Call(func, file, index, "rb", NULL, flag, simpleCigar, ...) : >> negative length vectors are not allowed >> >> Integer overflow somewhere, maybe ? >> >> - Dario. >> >> ---- Original message ---- >>> Date: Sun, 12 Dec 2010 20:59:23 -0800 >>> From: Martin Morgan <[email protected]> >>> Subject: Re: [Bioc-sig-seq] scanBam Error >>> To: [email protected] >>> Cc: [email protected] >>> >>> On 12/12/2010 08:00 PM, Dario Strbenac wrote: >>>> Hello, >>>> >>> >>>> I'm having trouble reading in a BAM file when "seq" is one of the >>> strings passed to the what argument of ScanBamParam. If it's not, then >>> the the reading completes successfully. I don't understand what the >>> error means. It is : >>>> >>>> Error in .io_bam(.scan_bam, file, index, reverseComplement, tmpl, param = >>>> param) : >>>> INTEGER() can only be applied to a 'integer', not a 'closure' >>>> >>>> The traceback is : >>>> >>>>> traceback() >>>> 4: .Call(func, file, index, "rb", NULL, flag, simpleCigar, ...) >>>> 3: .io_bam(.scan_bam, file, index, reverseComplement, tmpl, param = param) >>>> 2: scanBam("HS1808.bam", flag = ScanBamFlag(isDuplicate = FALSE), >>>> param = ScanBamParam(reverseComplement = TRUE, what = c("rname", >>>> "strand", "pos", "seq"))) >>>> 1: scanBam("HS1808.bam", flag = ScanBamFlag(isDuplicate = FALSE), >>>> param = ScanBamParam(reverseComplement = TRUE, what = c("rname", >>>> "strand", "pos", "seq"))) >>>> >>>> and the environment is : >>>> >>>> R version 2.12.0 (2010-10-15) >>>> Platform: x86_64-pc-mingw32/x64 (64-bit) >>>> >>>> locale: >>>> [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 >>>> LC_MONETARY=English_Australia.1252 LC_NUMERIC=C >>>> LC_TIME=English_Australia.1252 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] Rsamtools_1.2.1 Biostrings_2.18.0 GenomicRanges_1.2.0 >>>> IRanges_1.8.2 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] Biobase_2.8.0 >>> >>> Hi Dario -- this is some kind of error in Rsamtools' C code, but I'm not >>> able to reproduce it on my end so can't track it down. Is there any way >>> of producing and sharing with me an example file that has this problem? >>> >>> One thing (not causing the bug) in your traceback is that 'flag' should >>> be an argument to ScanBamParam; as it is I think it is being silently >>> ignored. >>> >>> Martin >>> >>>> >>>> -------------------------------------- >>>> Dario Strbenac >>>> Research Assistant >>>> Cancer Epigenetics >>>> Garvan Institute of Medical Research >>>> Darlinghurst NSW 2010 >>>> Australia >>>> >>>> _______________________________________________ >>>> Bioc-sig-sequencing mailing list >>>> [email protected] >>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>> >>> >>> -- >>> Computational Biology >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>> >>> Location: M1-B861 >>> Telephone: 206 667-2793 >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > > -------------------------------------- > Dario Strbenac > Research Assistant > Cancer Epigenetics > Garvan Institute of Medical Research > Darlinghurst NSW 2010 > Australia > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
