On Thu, Jul 17, 2008 at 9:47 AM, Krys Kelly <[EMAIL PROTECTED]> wrote: > I have inherited a pipeline for Solexa sequence data using Perl, Bioperl, > SSAHA and mySQL. As an R/Bioconducter user I am interested in ShortRead and > BiostringsCinterfaceDemo. > > However, in the short term I need to use the current pipeline. The imaging > is done by the Sequencing Facility and we get fastq files with the 3' > adapter still attached. The adapter removal is currently done by a Perl > script which just keeps sequences which match any number of letters in > [ACGT] followed by the first 8 letters of the adapter. This seems pretty > crude (e.g. only using 8 letters, not allowing for mismatches, not allowing > for the diminishing quality along the length of the read). > > Google has not revealed any algorithms or code for this part of the > pipeline. Does anyone know what algorithms are being used or, even better, > could anyone point me in the direction of some code?
I believe that MAQ will do this for you. You can then use the ShortRead package to read the MAQ output (VERY, VERY fast). Sean _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
