"Lana Schaffer" <[email protected]> writes: > Hi, > I have read Feb 2009 archives and have been trying to > filter alot of primer reads to see what I short reads > remaining. > The small RNA primer (TCGTATGCCGTCTTCTGCTTG) attached to > a series of A's is most contamination of the reads that > I would like to filter. > ------------------------------------------------------- > dist1 <- srdistance(clean(fq4), "TCGTATGCCGTCTTCTGCTTGAAAAAAAAAA") > table(dist1[[1]]) > 4 5 6 7 8 9 10 11 12 13 14 15 16 17 > 18 19 > 9338 789 406 121 2094 240 184 55 332 78 90 25 68 16 > 62 31 > 20 21 22 23 24 25 26 28 29 > 166 550 623 640 318 65 6 1 4 > > f <- fq4[dist1[[1]] <5]
clean(fq4) != fq4, so if this is your code you're subsetting the wrong object. Martin > [1] 35 NTAGTACTCTGCGTTGTGGCCGCAGCCACCTCGGT > [2] 35 NTCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAA > [3] 35 NTCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAA > [4] 35 NTCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAA > [5] 35 NCTGGACTTGGAGTCAGAAGATCTCGTATGCCGTC > [6] 35 NTCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAA > [7] 35 GGTATGATTCTCGCATCTCGTATGCCGTCTTCTGC > [8] 35 GGTATGATTCTCGCATCTCGTATGCCGTCTCCTGC > [9] 35 ATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAA > ... ... ... > [9363] 35 TCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAA > [9364] 35 ATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAA > [9365] 35 TCGTATGCCGTCTTCTGCTTGAAAAAAAAAAACAA > [9366] 35 ATATAATACAACCTGCTAAGTGATCTCGTATGCCG > [9367] 35 ATCTCGTATGCCGTCTTCTGCTTGACAAAAAAAAA > [9368] 35 ATCTCGTATGCCGTCTTCTGCTTGAAAAACAACAA > [9369] 35 ATCTCGTATGCCGTCTTCTGCTTGAACCACACAAA > [9370] 35 GTATGCCGTCTTCTGCTTGAAAAAAAAAAAAACCA > [9371] 35 ATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAA > > f <- fq4[dist1[[1]] >28] > [1] 35 ATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAA > [2] 35 CGATCATCTCGTATGCCGTCTTCTGCTTGAAAAAA > [3] 35 GTATGCCGTCTTCTGCTTGAAAAAAAAAAACAACC > [4] 35 CAGCAATCTCGTATGCCGTCTTCTGCTTGAAAAAA > --------------------------------------------------------- > You can see that I am not doing a good filtering job. > d<5 is showing some sequences free of primer that I would > want to save. > I have tried the polyn function, but that does not work for me > when I use a series of 10-20 A's (<35). > > Would someone be able to give me some suggestions? > > > sessionInfo() > R version 2.9.0 Under development (unstable) (2009-02-12 r47905) > i386-pc-mingw32 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] ShortRead_1.1.50 lattice_0.17-20 BSgenome_1.11.9 > Biostrings_2.11.42 > [5] IRanges_1.1.54 > loaded via a namespace (and not attached): > [1] Biobase_2.3.11 grid_2.9.0 hwriter_1.1 > Matrix_0.999375-20 > > > > Lana Schaffer > Biostatistics/Informatics > The Scripps Research Institute > DNA Array Core Facility > La Jolla, CA 92037 > (858) 784-2263 > (858) 784-2994 > [email protected] > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
