Hi Ivan --

On 11/11/2010 08:25 AM, [email protected] wrote:
> Hello list,
> 
> I have scanned a large bam (15G) file from Bioscope (SOLID) using
> Rsamtools and the code below:
> 
>> library(Rsamtools)
> Loading required package: GenomicRanges
>>
>>
>> p4<- ScanBamParam(what = c("seq"), flag = scanBamFlag(isUnmappedQuery
>> = TRUE))
>>
>> res3 <- scanBam("test.bam",param=p4, maxMemory=5000)[[1]]

the 'maxMemory' argument is being silently ignored by scanBam, it
doesn't do anything.

The problem could be in the BAM file, in samtools, or in Rsamtools. To
narrow down you might confirm basic functionality in Rsamtools (e.g.,
example(scanBam)), reading the header of the file (scanBamHeader), and
loading a smaller portion of the bam file in Rsamtools (using the
'which' argument of ScanBamParam). It might also be informative to know
whether the issue is with seq/qual, or with other parts of the reads, in
particular what=c("rname", "pos").

Outside R, you might try to view a small portion of your bam file with

  samtools view <your-file-here> chr1:100-200|wc -l

or similar

Martin

> 
> 
> it is not clear to me why I get all sequences as
> 
> 
>> res3$seq[1]
> 
>  A DNAStringSet instance of length 1
>     width seq
> [1]     1 N
> 
> 
> and all Phred-encoded, phred-scaled base quality scores as:
> 
> 
>> p4<- ScanBamParam(what = c("qual"), flag = scanBamFlag(isUnmappedQuery
>> = TRUE))
>>
>>
>> res3 <-
>> scanBam("solid0085_20090610_ICGC_Xeno_wholetranscrptome_4041X_F3_sortedByReadId.bam",param=p4,
>> maxMemory=5000)[[1]]
>>
> 
> 
> 
>> res3$qual[1]
>   A PhredQuality instance of length 1
>     width seq
> [1]     1 !
> 
> 
> Many thanks for any suggestions,
> 
> Ivan
> 
> 
>> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-pc-linux-gnu (64-bit)
> 
> locale:
>  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_CA.UTF-8
>  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] Rsamtools_1.2.0     GenomicRanges_1.2.0 Biostrings_2.18.0
> [4] IRanges_1.8.0
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.10.0 tools_2.12.0
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to