I have inherited a pipeline for Solexa sequence data using Perl, Bioperl, SSAHA and mySQL. As an R/Bioconducter user I am interested in ShortRead and BiostringsCinterfaceDemo.
However, in the short term I need to use the current pipeline. The imaging is done by the Sequencing Facility and we get fastq files with the 3' adapter still attached. The adapter removal is currently done by a Perl script which just keeps sequences which match any number of letters in [ACGT] followed by the first 8 letters of the adapter. This seems pretty crude (e.g. only using 8 letters, not allowing for mismatches, not allowing for the diminishing quality along the length of the read). Google has not revealed any algorithms or code for this part of the pipeline. Does anyone know what algorithms are being used or, even better, could anyone point me in the direction of some code? Thanks Krys Dr Krystyna A Kelly Senior Research Associate David Baulcombe Group Department of Plant Sciences University of Cambridge Downing Street Cambridge CB2 3EA United Kingdom Tel: +44 (0)1223 333 915 Fax: +44 (0)1223 333953 _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
