i have scoured our archives and found little regarding role of solexa
quality
scores as reported in fastq outputs in short read filtering.

my understanding is that a numerical score of -4 or greater indicates more
probability
mass on the called base than on any other.  in checking 1e6 reads on each of
two lanes
i found the frequency of the event " fewer than three bases have score less
than -4" to be
4e-3 in one lane and 2e-3 in another.  in other words, filtering by
requiring no more than
two < -4 scores would take you from a million reads to about 2000-4000,
assuming i have
not taken a biased sample (i may have, just took the first 1e6 in fastq).

is there any reason to regard a call with score < -4 to be much different
from an 'N'?

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to