On Mon, Aug 16, 2010 at 12:44 PM, [email protected] < [email protected]> wrote:
> Dear bioc-sig-sequencing, > > I am trying to follow a BioC2010 lab handout ( > http://www.bioconductor.org/help/course-materials/2010/BioC2010/Workflow.pdf) > to find differential peaks in a chipseq experiment. The "unevaluated code > block" on page one of the handout is, > > > filter <- compose(chipseqFilter(), alignQualityFilter(15)) > > cstest <- seqapply(sampleFiles, function(file) { > + as(readAligned(file, filter), "GRanges") > + }) > > To try to implement this, > > > dataDir <- "/home/mterry/ceru_data" > > dataDir > [1] "/home/mterry/ceru_data" > > > pattern <- "s_[8I]_export_chr1.txt" > > list.files(dataDir, pattern) > [1] "s_8_export_chr1.txt" "s_I_export_chr1.txt" > > Two Eland aligned files selected (analogous to cstest$ctcf & cstest$gfp in > handout). > > > filt1 <- alignDataFilter(expression(filtering=="Y")) > > filt2 <- chromosomeFilter("chr[0-9XYM]+.fas") > > filt3 <- occurrenceFilter(withSread = FALSE) > > filt <- compose(filt1, filt2, filt3) > > Comments below, but btw, the chipseqFilter() command makes this easier, with the second two filters (and maybe even the first) combined into: chipseqFilter(exclude="_"). Calling chipseqFilter() with no arguments will also exclude the sex and mitochondrial chromosomes. > arabtest <- seqapply(dataDir, function(file) { > + as(readAligned(file, pattern, type="SolexaExport", filter=filt), > "GRanges") > + }) > > arabtest > GRangesList of length 1 > [[1]] > GRanges with 1989354 ranges and 7 elementMetadata values > seqnames ranges strand | run lane > tile > <Rle> <IRanges> <Rle> | <factor> <integer> > <integer> > [1] chr1.fas [ 7568294, 7568328] + | 9 8 > 1 > [2] chr1.fas [ 167488, 167522] + | 9 8 > 1 > > > sessionInfo() > R version 2.12.0 Under development (unstable) (2010-08-02 r52661) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=en_US.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicFeatures_1.1.6 chipseq_0.99.1 ShortRead_1.7.11 > [4] Rsamtools_1.1.11 lattice_0.18-8 BSgenome_1.17.6 > [7] Biostrings_2.17.27 GenomicRanges_1.1.20 IRanges_1.7.15 > > loaded via a namespace (and not attached): > [1] Biobase_2.9.0 biomaRt_2.5.1 DBI_0.2-5 grid_2.12.0 > [5] hwriter_1.2 RCurl_1.4-3 RSQLite_0.9-2 rtracklayer_1.9.4 > [9] XML_3.1-0 > > > > > My questions/observations: > > 1. I was expecting "GRangesList of length 2", not "1"? > You are looping over the directory, not the files that match the pattern. It appears you've made the extremely common mistake of assuming that readAligned returns a separate object for each file matching the pattern. As its name suggests, it returns an AlignedRead object, which does not distinguish between input files (everything is merged). Again, this is a common mistake, and it would be great if ShortRead had some way of more easily achieving what you want to do here (a very common use case). > 2. From separate runs, when selecting only one of the Eland aligned files > at a time, the "1989354" ranges found here (from the "two file pattern") > are nearly (but not exactly) the sum of the ranges for the two files. > > 3. What should I have done to get the expected "GRangesList of length 2" > with ranges from each input file in separate elements of the GRangesList > object? > > Loop over the output of your list.files() command instead, except you'll need full.names=TRUE. > 4. An ancillary question, how can I change the designation to a name of my > choosing for the GRangeslist elements (for example, the lab handout has > "$ctcf", while I have "[[1]]")? > > seqapply(), like lapply(), will use the names from the input vector. So just name your filename vector. > > Thanks, > P. Terry > [email protected] > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
