Hi, I was writing to check if there is a usable poly-A removal function to remove the poly-reads where all bases are A's .. From what I understand, this happens because of a constant intensity originating from a spec or edges of the lane.
I will search for the same, but I am also looking for a start-up set of commands to load the requisite libraries along with ShortReads to get onto this analysis. Cheers, Sumit -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Cei Abreu-Goodger Sent: Sunday, February 22, 2009 6:23 PM To: [email protected] Subject: [Bioc-sig-seq] Low-complexity read filtering/trimming Hi all, I've been playing around with some Solexa small-RNA reads using ShortRead and Biostrings. I've used the 'trimLRPatterns' function to remove adapter sequence, and I've been trying to remove low-complexity sequences with 'srFilter'. I would first really like to congratulate all the people involved for the great work. There are two situations in which I would be grateful for some suggestions, though: 1) I have many "low-complexity" reads. Some are simply polyA, polyC, etc. But some others are runs of "ATATAT" or "CACACACA", etc. Previously I would have used "dust" on the command line to filter out this kind of read in a fasta file. Any ideas on how to achieve similar functionality in the ShortRead world? 2) For some reads I may have a "N-rich" patch inside the read, for example: AATAAAGTGCTTACAGTGNNNNTNNATNCAATACCG I would ideally like to trim of everything starting at the "N-rich" part. I was trying to implement something with 'vmatchPattern', but if I allow for mismatches (for a more flexible search) I will also get hits starting before the run of Ns. Many thanks, Cei sessionInfo() R version 2.9.0 Under development (unstable) (2009-02-13 r47919) i386-apple-darwin9.6.0 locale: C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] ShortRead_1.1.39 lattice_0.17-20 BSgenome_1.11.9 Biostrings_2.11.28 [5] IRanges_1.1.38 Biobase_2.3.10 loaded via a namespace (and not attached): [1] Matrix_0.999375-20 grid_2.9.0 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
