Re: [Bioc-sig-seq] trimLRPatterns question

Harris A. Jaffee Tue, 22 Feb 2011 22:22:57 -0800

You hint that your trimLen is > 50, yet when you show yourmax.Rmismatch, it's clearly of length 47, composedof rep(1,12) and rep(2,N) with N=trimLen-15. So N=35 andtrimLen=50. A max.Rmismatch of length 47 (= 50-3)will be augmented by rep(-1,3) at the bottom, preventing any matchesof an adapter prefix of length 3 or lesswith a read suffix. This is an effect that you want, but to allowmore mismatch tolerance for longer adapterprefixes, against read suffixes, your max.Rmismatch vector must bemonotone increasing, even if not strictly,rather than decreasing. So your max.Rmismatch is at least reversed.Its max (last value) might also be > 2.

Perhaps like this:

> ceiling(1:47 * 1/12)

[1] 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 33 3 3 3 4 4

[39] 4 4 4 4 4 4 4 4 4


This will be better than any max.Rmismatch=e, with e in (0,1).

-Harris

On Feb 22, 2011, at 11:10 PM, Kunbin Qu wrote:

Hi,
I have a 50 cycle RNA-seq run, with variable length of adapters insome reads. The length of the adapter is longer than 50 bp, and theportion in each read could be as high as the full length of 50cycles, as well as zero bp. When I trim the adapter, I do not wantover trim, so the max.Rmismatch parameter I used was shorter than(eg, 3 bp shorter) than the full length, also with more mismatchtolerance for longer portion, less mismatch for short segment. So Iwrote like the following:
trimLen<-length(adapter)
trimCoords<-trimLRPatterns(Rpattern=adapter, subject=seqs,max.Rmismatch = c(rep(2,trimLen-15), rep(1, 12)), ranges=T,with.Rindels=TRUE)
the value for the max.Rmismatch is basically:
c(rep(2,trimLen-15), rep(1, 12))
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 22 2 2 2 1 1 1
[39] 1 1 1 1 1 1 1 1 1
So I want to leave the last 3 bp alone (to avoid over-trimming bychance), then followed by 12 bp with 1 mismatch, then 35 bp with 2mismatch. Somehow when I tried this to the read, it does not seemwork, the trimmed reads really do not seem what they should be.Anyone can help me on this? Thanks
-Kunbin
sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] ShortRead_1.6.2     Rsamtools_1.0.1     lattice_0.19-11
[4] Biostrings_2.16.7   GenomicRanges_1.0.1 IRanges_1.6.8

loaded via a namespace (and not attached):
[1] Biobase_2.8.0 grid_2.11.0   hwriter_1.3   tools_2.11.0
______________________________________________________________________
The contents of this electronic message, including any attachments,are intended only for the use of the individual or entity to whichthey are addressed and may contain confidential information. If youare not the intended recipient, you are hereby notified that anyuse, dissemination, distribution, or copying of this message or anyattachment is strictly prohibited. If you have received thistransmission in error, please send an e-mail to[email protected] and delete this message, along withany attachments, from your computer.
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] trimLRPatterns question

Reply via email to