Hi,

On Thu, Jun 3, 2010 at 3:39 PM, Pratap, Abhishek
<[email protected]> wrote:
> Hi All
>
> I would like to extract and count the last 5 quality values from the FASTQ 
> file. I have read the file using "readFastq" and have stored the quality 
> values as a BStringSet.
>
> Eg :
> A BStringSet instance of length 5119916
>          width seq
>      [1]    75 
> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
>      [2]    75 
> bbbbbbbbbbbbabbbbbb`bbbbbbab`b_...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
>      [3]    75 
> aaaaaaa_aaaaO`aa^aaa_a_T_``^[`S...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
>      [4]    75 
> bbbbbbbbbbbbaabbbb`bbb_Uaa___BB...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
>      [5]    75 
> ``a`aa`aaYaTaaaBBBBBBBBBBBBBBBB...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
>
> What I would like to do is subseq the last 5 quality values and do a count on 
> #B. We suspect despite good avg quality we still have HIGH bad bases at the 
> end of reads.
>
> Any other ideas welcome.

How about just plotting the average quality score at each base
position by doing something like:

1. Converting your phred score BStringSet into a matrix of its numeric values
2. Plotting the colMeans(...) of that matrix.

Maybe?

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to