Hi, On Thu, Jun 3, 2010 at 3:39 PM, Pratap, Abhishek <[email protected]> wrote: > Hi All > > I would like to extract and count the last 5 quality values from the FASTQ > file. I have read the file using "readFastq" and have stored the quality > values as a BStringSet. > > Eg : > A BStringSet instance of length 5119916 > width seq > [1] 75 > BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB > [2] 75 > bbbbbbbbbbbbabbbbbb`bbbbbbab`b_...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB > [3] 75 > aaaaaaa_aaaaO`aa^aaa_a_T_``^[`S...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB > [4] 75 > bbbbbbbbbbbbaabbbb`bbb_Uaa___BB...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB > [5] 75 > ``a`aa`aaYaTaaaBBBBBBBBBBBBBBBB...BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB > > What I would like to do is subseq the last 5 quality values and do a count on > #B. We suspect despite good avg quality we still have HIGH bad bases at the > end of reads. > > Any other ideas welcome.
How about just plotting the average quality score at each base position by doing something like: 1. Converting your phred score BStringSet into a matrix of its numeric values 2. Plotting the colMeans(...) of that matrix. Maybe? -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
