Hi Ivan --
On 11/11/2010 08:25 AM, [email protected] wrote:
> Hello list,
>
> I have scanned a large bam (15G) file from Bioscope (SOLID) using
> Rsamtools and the code below:
>
>> library(Rsamtools)
> Loading required package: GenomicRanges
>>
>>
>> p4<- ScanBamParam(what = c("seq"), flag = scanBamFlag(isUnmappedQuery
>> = TRUE))
>>
>> res3 <- scanBam("test.bam",param=p4, maxMemory=5000)[[1]]
the 'maxMemory' argument is being silently ignored by scanBam, it
doesn't do anything.
The problem could be in the BAM file, in samtools, or in Rsamtools. To
narrow down you might confirm basic functionality in Rsamtools (e.g.,
example(scanBam)), reading the header of the file (scanBamHeader), and
loading a smaller portion of the bam file in Rsamtools (using the
'which' argument of ScanBamParam). It might also be informative to know
whether the issue is with seq/qual, or with other parts of the reads, in
particular what=c("rname", "pos").
Outside R, you might try to view a small portion of your bam file with
samtools view <your-file-here> chr1:100-200|wc -l
or similar
Martin
>
>
> it is not clear to me why I get all sequences as
>
>
>> res3$seq[1]
>
> A DNAStringSet instance of length 1
> width seq
> [1] 1 N
>
>
> and all Phred-encoded, phred-scaled base quality scores as:
>
>
>> p4<- ScanBamParam(what = c("qual"), flag = scanBamFlag(isUnmappedQuery
>> = TRUE))
>>
>>
>> res3 <-
>> scanBam("solid0085_20090610_ICGC_Xeno_wholetranscrptome_4041X_F3_sortedByReadId.bam",param=p4,
>> maxMemory=5000)[[1]]
>>
>
>
>
>> res3$qual[1]
> A PhredQuality instance of length 1
> width seq
> [1] 1 !
>
>
> Many thanks for any suggestions,
>
> Ivan
>
>
>> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8
> [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Rsamtools_1.2.0 GenomicRanges_1.2.0 Biostrings_2.18.0
> [4] IRanges_1.8.0
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.10.0 tools_2.12.0
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing