Dear list, thanks for pushing this issue forward. I will try your options in the next days.
Cei, microRNAs are a good point regarding adapter removal. Best, Dave > Date: Wed, 14 Jan 2009 16:17:17 -0800 > From: [email protected] > To: [email protected]; [email protected] > Subject: Re: [Bioc-sig-seq] adapter removal > > I just checked in a trimLRPatterns function to the Bioconductor svn > repository for BioC 2.4. Its signature is > > trimLRPatterns(Lpattern = NULL, Rpattern = NULL, subject, > max.Lmismatch = 0, max.Rmismatch = 0, > with.Lindels = FALSE, with.Rindels = FALSE, > Lfixed = TRUE, Rfixed = TRUE, ranges = FALSE) > > As you can infer from the arguments, this function allows the user to > set the # of mismatches (if with.*indels = FALSE) / edit distance (if > with.*indels = TRUE) for the left and right flanking "patterns". It also > allows for IUPAC ambiguity letters in these flanking regions if *fixed = > FALSE. When ranges = FALSE, trimLRPatterns returns the trimmed strings. > When ranges = TRUE, it returns the ranges that you can use to trim the > strings. Here are some examples: > > > Lpattern <- "TTCTGCTTG" > > Rpattern <- "GATCGGAAG" > > subject <- DNAString("TTCTGCTTGACGTGATCGGA") > > subjectSet <- DNAStringSet(c("TGCTTGACGGCAGATCGG", > "TTCTGCTTGGATCGGAAG")) > > trimLRPatterns(Lpattern = Lpattern, subject = subject) > 11-letter "DNAString" instance > seq: ACGTGATCGGA > > trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = > subjectSet) > A DNAStringSet instance of length 2 > width seq > [1] 18 TGCTTGACGGCAGATCGG > [2] 0 > > trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = > subjectSet, > + ranges = TRUE) > IRanges object: > start end width > 1 1 18 18 > 2 10 9 0 > > This functionality will be available on bioconductor.org (and > downloadable via biocLite) in the next day or so, but you can also grab > Biostrings from svn directly if you need it sooner. It will also feed > its way into Biostrings documentation and training material before the > next release of Bioconductor in May. > > > Patrick > > > > Patrick Aboyoun wrote: > > David, > > Following up on Martin's comments, I am putting the finishing touches > > on a function called trimLRPatterns for the Biostrings package. Its > > purpose is to trim left and/or right flanking patterns from sequences, > > so it can strip 5' and/or 3' adapters from your reads. The signature > > for this function is > > > > trimLRPatterns(Lpattern=NULL, Rpattern=NULL, subject, max.Lnedit=0, > > max.Rnedit=0, > > with.Lindels=FALSE, with.Rindels=FALSE, Lfixed=TRUE, > > Rfixed=TRUE, > > rangesOnly = FALSE) > > > > I will be checking this function into the BioC 2.4 code line, which > > requires using R-devel, sometime today or tomorrow. I will send out an > > e-mail to this group when I check it in and show a simple example of > > its usage. I talked with Martin and he will wrap this functionality in > > the ShortRead layer so you don't have to leave the ShortRead class > > system when removing adapters from your reads. > > > > > > Cheers, > > Patrick > > > _________________________________________________________________ [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
