P.,
As the error message suggests, there is a mismatch between
names(arab.chromlens) and levels(chromosome(alns)), meaning the
chromosome lengths vector and the AlignedRead object are not in sync.
The aligned reads for this experiment were from a mouse model, not
arabidopsis thaliana, so you would need to reference
BSgenome.Mmusculus.UCSC.mm9 when performing these operations:
> filt1 <- alignDataFilter(expression(filtering=="Y"))
> filt2 <- chromosomeFilter("chr[0-9XYM]+.fa")
> filt <- compose(filt1, filt2)
> alns <- readAligned(extdataDir, pattern, type="SolexaExport",
filter=filt)
> alns
class: AlignedRead
length: 195719 reads; width: 35 cycles
chromosome: chr11.fa chr9.fa ... chr8.fa chr4.fa
position: 104853312 3036336 ... 44295163 47191474
strand: - - ... - -
alignQuality: NumericQuality
alignData varLabels: run lane ... filtering contig
> levels(a...@chromosome) <- sub(".fa$", "", levels(chromosome(alns)))
> library(BSgenome.Mmusculus.UCSC.mm9)
> mm9.chromlens <- seqlengths(Mmusculus)
> head(mm9.chromlens)
chr1 chr2 chr3 chr4 chr5 chr6
197195432 181748087 159599783 155630120 152537259 149517037
> cov.mm9 <- coverage(alns, width = mm9.chromlens, extend = 126L)
> cov.mm9
SimpleRleList of length 22
$chr1
'integer' Rle of length 197195432 with 27263 runs
Lengths: 3018534 161 16703 161 68815 161 33063 161 58217 161 ...
Values : 0 1 0 1 0 1 0 1 0 1 ...
$chr10
'integer' Rle of length 129993255 with 21699 runs
Lengths: 3019736 161 11311 161 4238 161 10661 161 793 161 ...
Values : 0 1 0 1 0 1 0 1 0 1 ...
$chr11
'integer' Rle of length 121843856 with 22105 runs
Lengths: 3000315 6 40 79 9 4 23 6 2 38 ...
Values : 0 1 2 3 4 5 6 5 4 5 ...
$chr12
'integer' Rle of length 121257530 with 18183 runs
Lengths: 3002552 161 6903 161 4375 161 5041 161 2491 161 ...
Values : 0 1 0 1 0 1 0 1 0 1 ...
$chr13
'integer' Rle of length 120284312 with 15907 runs
Lengths: 3001262 161 5650 161 29080 161 111 40 121 40 ...
Values : 0 1 0 1 0 1 0 1 2 1 ...
...
<17 more elements>
> sessionInfo()
R version 2.11.0 Under development (unstable) (2010-01-02 r50884)
i386-apple-darwin9.8.0
locale:
[1] C/C/C/C/C/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] BSgenome.Mmusculus.UCSC.mm9_1.3.16
[2] BSgenome.Athaliana.TAIR.04232008_1.3.16
[3] ShortReadTutorial_0.0.1
[4] ShortRead_1.5.10
[5] lattice_0.17-26
[6] BSgenome_1.15.3
[7] Biostrings_2.15.11
[8] IRanges_1.5.23
loaded via a namespace (and not attached):
[1] Biobase_2.7.3 grid_2.11.0 hwriter_1.1 tools_2.11.0
Cheers,
Patrick
[email protected] wrote:
Dear bioc-sig-sequencing,
I am trying to analyze Eland aligned files for differential expression, using
the 'A ChIP-Seq Data Analysis' handout from a 11/19/09 session at the 'High
throughput sequence analysis tools and approaches with Bioconductor' workshop
in Seattle.
I generated an error message in the following output. Can you comment?
...
alns_8 <- readAligned(cdataDir, pattern, "SolexaExport")
alns_8
class: AlignedRead
length: 1380439 reads; width: 35 cycles
chromosome: chr1.fas chr1.fas ... chr1.fas chr1.fas
position: 7568294 167488 ... 4687256 5376960
strand: + + ... + +
alignQuality: NumericQuality
alignData varLabels: run lane ... filtering contig
head(sread(alns_8))
A DNAStringSet instance of length 6
width seq
[1] 35 AGCTATGATCAAGAGAACCTTTCACGATCANNNCN
[2] 35 CGGACGACGGGTAGTTTCGGGCTGTACCAANNNAN
[3] 35 AGCTCAGCGATCTGAGCCACTTGCTCTTTGNNNTN
[4] 35 GGGCCATAGGCCCGTTAAAATATTTTTCTCTNNCT
[5] 35 ATTGTCCATTGACAAATGAAGATATTGGGATNNTT
[6] 35 ACCCCTCCACCAGTATGTTGGCGAAAATCTCNNCC
table(strand(alns_8), useNA="ifany")
- + *
689912 690527 0
...
library(BSgenome.Athaliana.TAIR.04232008)
arab.chromlens <- seqlengths(Athaliana)
head(arab.chromlens)
chr1 chr2 chr3 chr4 chr5 chrC
30432563 19705359 23470805 18585042 26992728 154478
cov.arab8 <- coverage(alns_8, width = arab.chromlens, extend = 126L)
Error: UserArgumentMismatch
'names(width)' (or 'names(end)') mismatch with 'levels(chromosome(x))'
see ?"AlignedRead-class"
sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-pc-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome.Athaliana.TAIR.04232008_1.3.16
[2] chipseq_0.2.0
[3] ShortRead_1.4.0
[4] lattice_0.17-26
[5] BSgenome_1.14.0
[6] Biostrings_2.14.1
[7] IRanges_1.4.2
loaded via a namespace (and not attached):
[1] Biobase_2.6.0 grid_2.10.1 hwriter_1.1
Thanks,
P. Terry
[email protected]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing