On 09/22/2010 10:55 AM, [email protected] wrote:
>  Dear bioc-sig-sequencing,
> 
> In comparing two approaches for filtering Eland aligned reads when inputing 
> the data with ReadAligned, I get an approximately 30% difference in the 
> number of reads surviving.  So my question: which approach should I use, or 
> some other combination of functions?
> 
> Roughly following the BioC2010 lab 
> (http://www.bioconductor.org/help/course-materials/2010/BioC2010/Workflow.pdf),
>  the two approaches and the number of reads resulting follow (note: 1380439 
> lines/reads in input file)
> 
> 
>> filt1 <- alignDataFilter(expression(filtering=="Y"))

Hi --

I guess the alignDataFilter() is the main difference, removing reads
that do not have a 'Y' to indicate that they pass Illumina's own read
quality (_not_ based on alignment) criterion. I guess these are reads
that Illumina isn't confident in, but that nonetheless align to the
genome. It might pay to read some of the data in and explore the
consequences of each of the filters independently...

Martin

>> filt2 <- chromosomeFilter("chr[0-9XYM]+.fas")
>> filt3 <- occurrenceFilter(withSread = FALSE)
>> filt <- compose(filt1, filt2, filt3)
>> arabtest <- seqapply(fls, function(file) {
> +   as(readAligned(file, type="SolexaExport", filter=filt), "GRanges")
> + })
>> arabtest
> GRangesList of length 1
> [[1]]
> GRanges with 966869 ranges and 7 elementMetadata values
> 
> Alternatively, (from page 1 of the lab previously referenced):
> 
>> filt <- compose(chipseqFilter(), alignQualityFilter(15))
>> arabtest <- seqapply(fls, function(file) {
> +   as(readAligned(file, type="SolexaExport", filter=filt), "GRanges")
> + })
>> arabtest
> GRangesList of length 1
> [[1]]
> GRanges with 1286501 ranges and 7 elementMetadata values
> 
> 
> Thanks,
> P. Terry
> [email protected]
> 
>       [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to