Hi Martin,

Thanks. The bam file was corrupted.

Ivan

Quoting Martin Morgan <[email protected]>:

Hi Ivan --

On 11/11/2010 08:25 AM, [email protected] wrote:
Hello list,

I have scanned a large bam (15G) file from Bioscope (SOLID) using
Rsamtools and the code below:

library(Rsamtools)
Loading required package: GenomicRanges


p4<- ScanBamParam(what = c("seq"), flag = scanBamFlag(isUnmappedQuery
= TRUE))

res3 <- scanBam("test.bam",param=p4, maxMemory=5000)[[1]]

the 'maxMemory' argument is being silently ignored by scanBam, it
doesn't do anything.

The problem could be in the BAM file, in samtools, or in Rsamtools. To
narrow down you might confirm basic functionality in Rsamtools (e.g.,
example(scanBam)), reading the header of the file (scanBamHeader), and
loading a smaller portion of the bam file in Rsamtools (using the
'which' argument of ScanBamParam). It might also be informative to know
whether the issue is with seq/qual, or with other parts of the reads, in
particular what=c("rname", "pos").

Outside R, you might try to view a small portion of your bam file with

  samtools view <your-file-here> chr1:100-200|wc -l

or similar

Martin



it is not clear to me why I get all sequences as


res3$seq[1]

 A DNAStringSet instance of length 1
    width seq
[1]     1 N


and all Phred-encoded, phred-scaled base quality scores as:


p4<- ScanBamParam(what = c("qual"), flag = scanBamFlag(isUnmappedQuery
= TRUE))


res3 <-
scanBam("solid0085_20090610_ICGC_Xeno_wholetranscrptome_4041X_F3_sortedByReadId.bam",param=p4,
maxMemory=5000)[[1]]




res3$qual[1]
  A PhredQuality instance of length 1
    width seq
[1]     1 !


Many thanks for any suggestions,

Ivan


sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_CA.UTF-8
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Rsamtools_1.2.0     GenomicRanges_1.2.0 Biostrings_2.18.0
[4] IRanges_1.8.0

loaded via a namespace (and not attached):
[1] Biobase_2.10.0 tools_2.12.0

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to