P.,
The error message from aggregate isn't very informative and I'll clean
it up.
The aggregate function threw an error for the cov.y object because the
ranges in allPeaks referenced indices outside of the bounds of cov.y, in
particular cov.y is an Rle of length 11 and allPeaks included the
interval [17, 19]. If you know the length of underlying sequence, you
can pass that into the width argument to the coverage function. For
example, if the underlying sequence is of length 19, then the coverage
from the y ranges would be calculated as shown below. (I also added code
for more efficient summation withing the specified ranges.)
> cov.y <- coverage(y, width = 19)
> cov.y
'integer' Rle of length 19 with 5 runs
Lengths: 3 2 4 2 8
Values : 0 3 0 3 0
> y.counts <- aggregate(cov.y, allPeaks, sum)
> y.counts
[1] 6 0
> y.counts.efficient <- viewSums(Views(cov.y, allPeaks))
> y.counts.efficient
[1] 6 0
> sessionInfo()
R version 2.10.1 Patched (2009-12-14 r50738)
i386-apple-darwin9.8.0
locale:
[1] C/en_US.UTF-8/C/C/C/C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] IRanges_1.4.9
loaded via a namespace (and not attached):
[1] tools_2.10.1
Cheers,
Patrick
[email protected] wrote:
Dear bioc-sig-sequencing,
I am working with a toy example to learn the material covered in part 3
(Differential expression, pages 10-11) of 'A ChIP-Seq Data Analysis' handout
for a 11/19/09 session at the 'High throughput sequence analysis tools and
approaches with Bioconductor' workshop in Seattle.
I generated an error message in the following output. Can you comment?
(I note that when I use the sample data & code from the handout, ctcf.rda &
gfp.rda, no errors are generated)
x <- IRanges(start=c(1L, 9L, 4L, 1L, 5L, 10L, 15L, 17L, 17L),
+ width=c(5L, 6L, 3L, 4L, 3L, 3L, 5L, 3L, 3L))
y <- IRanges(start=c(4L, 4L, 4L, 10L, 10L, 10L),
+ width=c(2L, 2L, 2L, 2L, 2L, 2L))
cov.x <- coverage(x)
cov.y <- coverage(y)
allPeaks <- slice(cov.x, lower = 3)
allPeaks
Views on a 19-length Rle subject
views:
start end width
[1] 4 5 2 [3 3]
[2] 17 19 3 [3 3 3]
x.counts <- aggregate(cov.x, allPeaks, sum)
x.counts
[1] 6 9
y.counts <- aggregate(cov.y, allPeaks, sum)
Error in findIntervalAndStartFromWidth(start, runLength(x)) :
'x' must be less than 'sum(width)'
sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-pc-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ChIPseqTutorial_0.0.1 BSgenome.Mmusculus.UCSC.mm9_1.3.16
[3] chipseq_0.2.0 ShortRead_1.4.0
[5] lattice_0.17-26 BSgenome_1.14.0
[7] Biostrings_2.14.1 IRanges_1.4.2
loaded via a namespace (and not attached):
[1] Biobase_2.6.0 grid_2.10.1 hwriter_1.1
Thanks,
P. Terry
huskers.unl.edu
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing