Dear bioc-sig-sequencing,
I am trying to follow a BioC2010 lab handout
(http://www.bioconductor.org/help/course-materials/2010/BioC2010/Workflow.pdf)
to find differential peaks in a chipseq experiment. The "unevaluated code
block" on page one of the handout is,
> filter <- compose(chipseqFilter(), alignQualityFilter(15))
> cstest <- seqapply(sampleFiles, function(file) {
+ as(readAligned(file, filter), "GRanges")
+ })
To try to implement this,
> dataDir <- "/home/mterry/ceru_data"
> dataDir
[1] "/home/mterry/ceru_data"
> pattern <- "s_[8I]_export_chr1.txt"
> list.files(dataDir, pattern)
[1] "s_8_export_chr1.txt" "s_I_export_chr1.txt"
Two Eland aligned files selected (analogous to cstest$ctcf & cstest$gfp in
handout).
> filt1 <- alignDataFilter(expression(filtering=="Y"))
> filt2 <- chromosomeFilter("chr[0-9XYM]+.fas")
> filt3 <- occurrenceFilter(withSread = FALSE)
> filt <- compose(filt1, filt2, filt3)
> arabtest <- seqapply(dataDir, function(file) {
+ as(readAligned(file, pattern, type="SolexaExport", filter=filt), "GRanges")
+ })
> arabtest
GRangesList of length 1
[[1]]
GRanges with 1989354 ranges and 7 elementMetadata values
seqnames ranges strand | run lane tile
<Rle> <IRanges> <Rle> | <factor> <integer> <integer>
[1] chr1.fas [ 7568294, 7568328] + | 9 8 1
[2] chr1.fas [ 167488, 167522] + | 9 8 1
> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-08-02 r52661)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
[7] LC_PAPER=en_US.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.1.6 chipseq_0.99.1 ShortRead_1.7.11
[4] Rsamtools_1.1.11 lattice_0.18-8 BSgenome_1.17.6
[7] Biostrings_2.17.27 GenomicRanges_1.1.20 IRanges_1.7.15
loaded via a namespace (and not attached):
[1] Biobase_2.9.0 biomaRt_2.5.1 DBI_0.2-5 grid_2.12.0
[5] hwriter_1.2 RCurl_1.4-3 RSQLite_0.9-2 rtracklayer_1.9.4
[9] XML_3.1-0
>
My questions/observations:
1. I was expecting "GRangesList of length 2", not "1"?
2. From separate runs, when selecting only one of the Eland aligned files at a
time, the "1989354" ranges found here (from the "two file pattern") are nearly
(but not exactly) the sum of the ranges for the two files.
3. What should I have done to get the expected "GRangesList of length 2" with
ranges from each input file in separate elements of the GRangesList object?
4. An ancillary question, how can I change the designation to a name of my
choosing for the GRangeslist elements (for example, the lab handout has
"$ctcf", while I have "[[1]]")?
Thanks,
P. Terry
[email protected]
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing