Dear bioc-sig-sequencing,

In comparing two approaches for filtering Eland aligned reads when inputing the 
data with ReadAligned, I get an approximately 30% difference in the number of 
reads surviving.  So my question: which approach should I use, or some other 
combination of functions?

Roughly following the BioC2010 lab 
(http://www.bioconductor.org/help/course-materials/2010/BioC2010/Workflow.pdf), 
the two approaches and the number of reads resulting follow (note: 1380439 
lines/reads in input file)


> filt1 <- alignDataFilter(expression(filtering=="Y"))
> filt2 <- chromosomeFilter("chr[0-9XYM]+.fas")
> filt3 <- occurrenceFilter(withSread = FALSE)
> filt <- compose(filt1, filt2, filt3)
> arabtest <- seqapply(fls, function(file) {
+   as(readAligned(file, type="SolexaExport", filter=filt), "GRanges")
+ })
> arabtest
GRangesList of length 1
[[1]]
GRanges with 966869 ranges and 7 elementMetadata values

Alternatively, (from page 1 of the lab previously referenced):

> filt <- compose(chipseqFilter(), alignQualityFilter(15))
> arabtest <- seqapply(fls, function(file) {
+   as(readAligned(file, type="SolexaExport", filter=filt), "GRanges")
+ })
> arabtest
GRangesList of length 1
[[1]]
GRanges with 1286501 ranges and 7 elementMetadata values


Thanks,
P. Terry
[email protected]

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to