Re: [Bioc-sig-seq] ABout how to trim out the adaptor of soleax short data!

joseph Wed, 27 Jan 2010 18:48:53 -0800

Hi Harris
I am still confused, I used the small case to illustrate the mismatches. Does 
this work only if the mismatches are N's? How do you avoid unintended trimming 
if the mismatch is not N?


Thanks


________________________________
From: Harris A. Jaffee <[email protected]>

Cc: [email protected]
Sent: Wed, January 27, 2010 6:27:35 PM
[[elided Yahoo spam]]


The problem is with DNAString(Set), which accepts certain lower case letters
but uppercases them -- as you see from the result of your trimLRPatterns call
with read3, which ends with "GAA".  Your a's turned into 'A', and then "CAA"
fuzzy-matched "GTA", with 2 errors.

> DNAString("x")
Error in .charToSharedRaw(x, start = start, end = end, width = width,  : 
  key 120 not in lookup table

> DNAString("a")
  1-letter "DNAString" instance
seq: A

Your reads might as well have 'N' in place of your 'a', and then everything
works fine, and you see that the subject might as well be of type character.
If so, it's converted to a BString(Set) for processing, and then the result
is cast back to character type, unless you want 'ranges'.

> adapter = "GTAGGCACCA"

> read2 = "TGTTGTACCTGTAGGNACNA"
> trimLRPatterns(Rpattern=adapter, subject=read2, 
> max.Rmismatch=rep(2,nchar(adapter)))
[1] "TGTTGTACCT"

> read3 = "TGTTGTACCTGTANGNACNA"
> trimLRPatterns(Rpattern=adapter, subject=read3, 
> max.Rmismatch=rep(2,nchar(adapter)))
[1] "TGTTGTACCTGTANGNA"

> trimLRPatterns(Rpattern=adapter, subject=read3, 
> max.Rmismatch=rep(2,nchar(adapter)),
ranges=TRUE)
IRanges of length 1
    start end width
[1]     1  17    17

On Jan 27, 2010, at 7:57 PM, joseph wrote:
Hello
>In the example below read2 has 2 mismatches and the adapter was trimmed 
>correctly. However, read3 has 3 mismatches (and thus is not supposed to be 
>trimmed) lost its last 3 nucleotides. Can you please help me understand this 
>issue?
>
>> adapter = "GTAGGCACCA"
>> read2 <- DNAStringSet("TGTTGTACCTGTAGGaACaA")
>> read3 <- DNAStringSet("TGTTGTACCTGTAaGaACaA")
>> 
>> trimLRPatterns(Rpattern=adapter, subject=read2, 
>> max.Rmismatch=rep(2,nchar(adapter)))
>  A DNAStringSet instance of length 1
>    width seq
>[1]    10 TGTTGTACCT
>> trimLRPatterns(Rpattern=adapter, subject=read3, 
>> max.Rmismatch=rep(2,nchar(adapter)))
>  A DNAStringSet instance of length 1
>    width seq
>[1]    17 TGTTGTACCTGTAAGAA
>> 
>
>
>
>
>
________________________________
From: Harris A. Jaffee <[email protected]>
>To: Hongtao Hu <[email protected]>
>Cc: [email protected]
>Sent: Wed, January 27, 2010 3:40:39 PM
[[elided Yahoo spam]]
>
>Patrick's trimLRPatterns function in the Biostrings package.
>
>Beware that the allowed mismatches arguments, including the defaults of 0,
>are turned into vectors of the same length as the relevant pattern.  If a
>single integer is specified, including the default, it is turned into a
>vector with many -1's at the beginning, preventing any partial matching of
>the pattern.  So, the only possible trimming will be by the whole pattern,
>assuming that it matches well enough.  But the presence of the whole adaptor
>would be a rare event.  To permit arbitrary partial matching, exact or not,
>you have to give a vector of the same length as the relevant pattern, e.g.
>rep(e, nchar(pattern)), for whatever non-negative e you want to allow.  You
>can do this separately for the right and left adaptors.
>
>On Jan 27, 2010, at 3:45 PM, Hongtao Hu wrote:
>
>> Hey, dear all,
>> The adapotor in our dataset seems variable. Usually, how should it be 
>> trimmed out or which software would be used?
[[elided Yahoo spam]]
>> 
>> 
>> Bests,
>> Hongtao Hu
>> Department of Biological Sicences
>> Auburn University
>> Auburn, Al 36832
>> cell phone: 334-524-7282
>> Hongtao webpage: http://www.auburn.edu/%7Ehzh0005/
>> 
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>_______________________________________________
>Bioc-sig-sequencing mailing list
>[email protected]
>https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
>



      
        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] ABout how to trim out the adaptor of soleax short data!

Reply via email to