Re: [Bioc-sig-seq] ABout how to trim out the adaptor of soleax short data!

Harris A. Jaffee Wed, 27 Jan 2010 18:28:35 -0800

The problem is with DNAString(Set), which accepts certain lower case  
letters
but uppercases them -- as you see from the result of your  
trimLRPatterns call
with read3, which ends with "GAA".  Your a's turned into 'A', and  
then "CAA"
fuzzy-matched "GTA", with 2 errors.


 > DNAString("x")
Error in .charToSharedRaw(x, start = start, end = end, width = width,  :
   key 120 not in lookup table

 > DNAString("a")
   1-letter "DNAString" instance
seq: A

Your reads might as well have 'N' in place of your 'a', and then  
everything
works fine, and you see that the subject might as well be of type  
character.
If so, it's converted to a BString(Set) for processing, and then the  
result
is cast back to character type, unless you want 'ranges'.

 > adapter = "GTAGGCACCA"

 > read2 = "TGTTGTACCTGTAGGNACNA"
 > trimLRPatterns(Rpattern=adapter, subject=read2, max.Rmismatch=rep 
(2,nchar(adapter)))
[1] "TGTTGTACCT"

 > read3 = "TGTTGTACCTGTANGNACNA"
 > trimLRPatterns(Rpattern=adapter, subject=read3, max.Rmismatch=rep 
(2,nchar(adapter)))
[1] "TGTTGTACCTGTANGNA"

 > trimLRPatterns(Rpattern=adapter, subject=read3, max.Rmismatch=rep 
(2,nchar(adapter)),
        ranges=TRUE)
IRanges of length 1
     start end width
[1]     1  17    17

On Jan 27, 2010, at 7:57 PM, joseph wrote:
> Hello
> In the example below read2 has 2 mismatches and the adapter was  
> trimmed correctly. However, read3 has 3 mismatches (and thus is not  
> supposed to be trimmed) lost its last 3 nucleotides. Can you please  
> help me understand this issue?
>
> > adapter = "GTAGGCACCA"
> > read2 <- DNAStringSet("TGTTGTACCTGTAGGaACaA")
> > read3 <- DNAStringSet("TGTTGTACCTGTAaGaACaA")
> >
> > trimLRPatterns(Rpattern=adapter, subject=read2, max.Rmismatch=rep 
> (2,nchar(adapter)))
>   A DNAStringSet instance of length 1
>     width seq
> [1]    10 TGTTGTACCT
> > trimLRPatterns(Rpattern=adapter, subject=read3, max.Rmismatch=rep 
> (2,nchar(adapter)))
>   A DNAStringSet instance of length 1
>     width seq
> [1]    17 TGTTGTACCTGTAAGAA
> >
>
>
> From: Harris A. Jaffee <[email protected]>
> To: Hongtao Hu <[email protected]>
> Cc: [email protected]
> Sent: Wed, January 27, 2010 3:40:39 PM
> Subject: Re: [Bioc-sig-seq] ABout how to trim out the adaptor of  
> soleax short data!
>
> Patrick's trimLRPatterns function in the Biostrings package.
>
> Beware that the allowed mismatches arguments, including the  
> defaults of 0,
> are turned into vectors of the same length as the relevant  
> pattern.  If a
> single integer is specified, including the default, it is turned  
> into a
> vector with many -1's at the beginning, preventing any partial  
> matching of
> the pattern.  So, the only possible trimming will be by the whole  
> pattern,
> assuming that it matches well enough.  But the presence of the  
> whole adaptor
> would be a rare event.  To permit arbitrary partial matching, exact  
> or not,
> you have to give a vector of the same length as the relevant  
> pattern, e.g.
> rep(e, nchar(pattern)), for whatever non-negative e you want to  
> allow.  You
> can do this separately for the right and left adaptors.
>
> On Jan 27, 2010, at 3:45 PM, Hongtao Hu wrote:
>
> > Hey, dear all,
> > The adapotor in our dataset seems variable. Usually, how should  
> it be trimmed out or which software would be used?
> > The length of 3' and 5' adapotr are sperately over 20 nt, but the  
> total length of reads is 39 nt.  I wondering if Anyone who ever did  
> the similar job can share your experience? Appreciate!
> >
> >
> > Bests,
> > Hongtao Hu
> > Department of Biological Sicences
> > Auburn University
> > Auburn, Al 36832
> > cell phone: 334-524-7282
> > Hongtao webpage: http://www.auburn.edu/~hzh0005/
> >
> > _______________________________________________
> > Bioc-sig-sequencing mailing list
> > [email protected]
> > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] ABout how to trim out the adaptor of soleax short data!

Reply via email to