You mean something like ..

Pro Ala Tyr

Then yes in this case you would want to make a WordTokenization.

Best regards,

- Mark





Neil Bacon <[EMAIL PROTECTED]>
Sent by: [EMAIL PROTECTED]
08/01/2006 03:41 PM

 
        To:     [email protected]
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] three-letter Protein alphabet names


Hi,
I'm looking at extending biojava sequence io to read sequences from 
patents (initially current US data formats, later perhaps older formats 
and other jurisdictions).
Anyone done this already or interested?

Protein data uses 3-letter codes. I found an old posting about 3-letter 
codes:

[Biojava-dev] Protein alphabet names
http://lists.open-bio.org/pipermail/biojava-dev/2002-October/000143.html

>/   - Add an additional tokenization (probably called
/>/ "three-letter"
/>/     unless someone comes up with a better
/>/ suggestion) for people
/>/     who actually want 3-letter codes.
/

Did this happen (I can't find it)?
I'll try extending WordTokenization to do this unless someone has 
already done it or can advise me better (I'm new here and advice would 
be very welcome).

Cheers,
    Neil Bacon

_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l



_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to