Dear bioc-sig-sequencing, In comparing two approaches for filtering Eland aligned reads when inputing the data with ReadAligned, I get an approximately 30% difference in the number of reads surviving. So my question: which approach should I use, or some other combination of functions?
Roughly following the BioC2010 lab (http://www.bioconductor.org/help/course-materials/2010/BioC2010/Workflow.pdf), the two approaches and the number of reads resulting follow (note: 1380439 lines/reads in input file) > filt1 <- alignDataFilter(expression(filtering=="Y")) > filt2 <- chromosomeFilter("chr[0-9XYM]+.fas") > filt3 <- occurrenceFilter(withSread = FALSE) > filt <- compose(filt1, filt2, filt3) > arabtest <- seqapply(fls, function(file) { + as(readAligned(file, type="SolexaExport", filter=filt), "GRanges") + }) > arabtest GRangesList of length 1 [[1]] GRanges with 966869 ranges and 7 elementMetadata values Alternatively, (from page 1 of the lab previously referenced): > filt <- compose(chipseqFilter(), alignQualityFilter(15)) > arabtest <- seqapply(fls, function(file) { + as(readAligned(file, type="SolexaExport", filter=filt), "GRanges") + }) > arabtest GRangesList of length 1 [[1]] GRanges with 1286501 ranges and 7 elementMetadata values Thanks, P. Terry [email protected] [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
