Dear list,

I am using vectorstrip to find PCR primers in cloned PCR products. Strangely,
in some cases it misses a primer, because it overestimates the number of
mismatches.

In the following example, vectorstrip identifies the first primer with six
mismatches, although it has only two. It means that if I run vectorstrip with
a -mismatch value lower that 29, I do miss the primer.

The following is a mixture of shell commands and extracts of outputs. The
sequence consists of two reads assembled by using trimseq on .ab1 files, and
then merger on the resulting fasta files.


export 
SEQ="ttttcccccccccnntttttttnnnnncccccnnnnnnnnnaaaaAAccCcTcNCTaTagggCGAGTTggGccCtTCTAGTNtGCATGCtTCGAGcGGcccGccAGTgTTGATGGaTaTCTTGCaGaaTTcGcccTTaaTGAggTAACCgGTTcccAGCaGNttttttttttttttttttttttttttttttttttttttttttttttttttttttttttAaaaaGaaTTGtttattTACTGAACCNgggCAtAtTaGaTACACAACCCATTTTaaaTTTAcATcttttAAtTCaaTtTTGAAgTGttTTTAcAcAcCCNCNCAAaAaaaaaaaaaTTTGGCATGcAACAgCTgGGAACCGTtACCtCATTAAgggCGAAtTCcAGcAcAcTGGCgGCCGTTACtAAGGGATCCGAGCTcGGNACCAAGnnnngnnnnnnnnnnnnnnnnnnttntttnntnnnnaaaaa"

export LINKERA="AATGAGGTAACGGTTCCCAGC"

export LINKERB="GCTGGGAACCGTTACCTCATT"

vectorstrip     asis:$SEQ \
                -linkera=$LINKERA \
                -linkerb=$LINKERB \
                -outfile stdout \
                -outseq /dev/null \
                -novectorfile \
                -nobesthits \
                -mismatch 30


Sequence: asis   Vector: no_name
5' sequence matches:
        From 138 to 158 with 6 mismatches
3' sequence matches:
        From 351 to 371 with 0 mismatches
Sequences output to file:
        from 159 to 350
                CaGNtttttttttttttttttttttttttttttttttttttttttttttt
                ttttttttttttAaaaaGaaTTGtttattTACTGAACCNgggCAtAtTaG
                aTACACAACCCATTTTaaaTTTAcATcttttAAtTCaaTtTTGAAgTGtt
                TTTAcAcAcCCNCNCAAaAaaaaaaaaaTTTGGCATGcAACA
        sequence trimmed from 5' end:
                ttttcccccccccnntttttttnnnnncccccnnnnnnnnnaaaaAAccC
                cTcNCTaTagggCGAGTTggGccCtTCTAGTNtGCATGCtTCGAGcGGcc
                cGccAGTgTTGATGGaTaTCTTGCaGaaTTcGcccTTaaTGAggTAACCg
                GTTcccAG
        sequence trimmed from 3' end:
                gCTgGGAACCGTtACCtCATTAAgggCGAAtTCcAGcAcAcTGGCgGCCG
                TTACtAAGGGATCCGAGCTcGGNACCAAGnnnngnnnnnnnnnnnnnnnn
                nnttntttnntnnnnaaaaa

needle asis:$SEQ[138:158] asis:$LINKERA stdout -auto

asis             138 aaTGAggTAACCgGTTcccAG-    158
                     |||||||||| |||||||||| 
asis               1 AATGAGGTAA-CGGTTCCCAGC     21


Interestingly, in the following aligmnent, the number of mismatches is
6. But I did not find anything saying that gaps were disallowed in
vectorscript ?

aaTGAggTAACCgGTTcccAG
||||||||||| | | ||  
AATGAGGTAACGGTTCCCAGC


I am using emboss through fink (emboss package 4.0.0-2).

Have a nice day,

-- 
Charles Plessy
http://charles.plessy.org
Wako, Saitama, Japan
_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to