Well spotted Sylvain,

Keith, there's a method in AlphabetTools - getAllSymbols(). Feed it with the matches() map of the symbol & cat together the tokens from each of these.

Matthew

Keith James wrote:
"Sylvain" == Sylvain Foisy <[EMAIL PROTECTED]> writes:

    Sylvain> Hi, I used the createRegex() method to return a regular
    Sylvain> expression from a sequence of DNA inputted by the user to
    Sylvain> scan a genome for that motif. I just discovered an
    Sylvain> interesting thing about that method: if n is in the motif
    Sylvain> to seek, the regex will not have n as a possibility.

    Sylvain> Ok, I have that motif: atgnnnndgta.

    Sylvain> CreateRegex would return: atg[atcg]{4}gta and it does

    Sylvain> What if my sequence to scan contains n: atgagcngta, for
    Sylvain> exemple.  Java.util.regex would not find the
    Sylvain> pattern. Unless mistaken, the pattern should be
    Sylvain> atg[atcgn]{4}gta.

    Sylvain> Am I wrong? Any input would be appreciated

You are correct about the behaviour, but not about the solution. An
ambiguous target sequence could contain n, but could also contain r,
y, m, k, s, w, h, b, v and d. To match correctly the regex would have
to take into account that the symbols represented by n are a superset
of those represented by the other ambiguity symbols.

As MotifTools is generic (it will work for any alphabet) implementing
generation of regexes for searching ambiguous SymbolLists requires a
more complex algorithm than the current one. I'll take a look at this
as soon as I can.

Keith


--
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to