[Bioc-sig-seq] Implementing workflow.pdf (BioC2010 lab handout)

[email protected] Mon, 16 Aug 2010 12:45:56 -0700

 Dear bioc-sig-sequencing,

I am trying to follow a BioC2010 lab handout 
(http://www.bioconductor.org/help/course-materials/2010/BioC2010/Workflow.pdf) 
to find differential peaks in a chipseq experiment.  The "unevaluated code 
block" on page one of the handout is,


> filter <- compose(chipseqFilter(), alignQualityFilter(15))
> cstest <- seqapply(sampleFiles, function(file) {
+ as(readAligned(file, filter), "GRanges")
+ })

To try to implement this,

> dataDir <- "/home/mterry/ceru_data"
> dataDir
[1] "/home/mterry/ceru_data"

> pattern <- "s_[8I]_export_chr1.txt"
> list.files(dataDir, pattern)
[1] "s_8_export_chr1.txt" "s_I_export_chr1.txt"

Two Eland aligned files selected (analogous to cstest$ctcf & cstest$gfp in 
handout).

> filt1 <- alignDataFilter(expression(filtering=="Y"))
> filt2 <- chromosomeFilter("chr[0-9XYM]+.fas")
> filt3 <- occurrenceFilter(withSread = FALSE)
> filt <- compose(filt1, filt2, filt3)

> arabtest <- seqapply(dataDir, function(file) {
+   as(readAligned(file, pattern, type="SolexaExport", filter=filt), "GRanges")
+ })
> arabtest
GRangesList of length 1
[[1]]
GRanges with 1989354 ranges and 7 elementMetadata values
          seqnames               ranges strand   |      run      lane      tile
             <Rle>            <IRanges>  <Rle>   | <factor> <integer> <integer>
      [1] chr1.fas [ 7568294,  7568328]      +   |        9         8         1
      [2] chr1.fas [  167488,   167522]      +   |        9         8         1

> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-08-02 r52661)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8       LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicFeatures_1.1.6 chipseq_0.99.1        ShortRead_1.7.11
[4] Rsamtools_1.1.11      lattice_0.18-8        BSgenome_1.17.6
[7] Biostrings_2.17.27    GenomicRanges_1.1.20  IRanges_1.7.15

loaded via a namespace (and not attached):
[1] Biobase_2.9.0     biomaRt_2.5.1     DBI_0.2-5         grid_2.12.0
[5] hwriter_1.2       RCurl_1.4-3       RSQLite_0.9-2     rtracklayer_1.9.4
[9] XML_3.1-0
>


My questions/observations:

1. I was expecting "GRangesList of length 2", not "1"?

2. From separate runs, when selecting only one of the Eland aligned files at a 
time, the "1989354" ranges found here (from the "two file pattern")  are nearly 
(but not exactly) the sum of the ranges for the two files.

3. What should I have done to get the expected "GRangesList of length 2" with 
ranges from each input file in separate elements of the GRangesList object?

4. An ancillary question, how can I change the designation to a name of my 
choosing for the GRangeslist elements (for example, the lab handout has 
"$ctcf", while I have "[[1]]")?


Thanks,
P. Terry
[email protected]

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

[Bioc-sig-seq] Implementing workflow.pdf (BioC2010 lab handout)

Reply via email to