[Bioc-sig-seq] Low-complexity read filtering/trimming

Cei Abreu-Goodger Sun, 22 Feb 2009 17:14:59 -0800

Hi all,

I've been playing around with some Solexa small-RNA reads usingShortRead and Biostrings. I've used the 'trimLRPatterns' function toremove adapter sequence, and I've been trying to remove low-complexitysequences with 'srFilter'. I would first really like to congratulate allthe people involved for the great work. There are two situations inwhich I would be grateful for some suggestions, though:

1) I have many "low-complexity" reads. Some are simply polyA, polyC,etc. But some others are runs of "ATATAT" or "CACACACA", etc. PreviouslyI would have used "dust" on the command line to filter out this kind ofread in a fasta file. Any ideas on how to achieve similar functionalityin the ShortRead world?


2) For some reads I may have a "N-rich" patch inside the read, for example:
AATAAAGTGCTTACAGTGNNNNTNNATNCAATACCG

I would ideally like to trim of everything starting at the "N-rich"part. I was trying to implement something with 'vmatchPattern', but if Iallow for mismatches (for a more flexible search) I will also get hitsstarting before the run of Ns.


Many thanks,

Cei



sessionInfo()

R version 2.9.0 Under development (unstable) (2009-02-13 r47919)
i386-apple-darwin9.6.0

locale:
C

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:

[1] ShortRead_1.1.39 lattice_0.17-20 BSgenome_1.11.9Biostrings_2.11.28

[5] IRanges_1.1.38     Biobase_2.3.10

loaded via a namespace (and not attached):
[1] Matrix_0.999375-20 grid_2.9.0


--

The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE.


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

[Bioc-sig-seq] Low-complexity read filtering/trimming

Reply via email to