Hi, I'm looking at extending biojava sequence io to read sequences from patents (initially current US data formats, later perhaps older formats and other jurisdictions). Anyone done this already or interested?
Protein data uses 3-letter codes. I found an old posting about 3-letter codes: [Biojava-dev] Protein alphabet names http://lists.open-bio.org/pipermail/biojava-dev/2002-October/000143.html >/ - Add an additional tokenization (probably called />/ "three-letter" />/ unless someone comes up with a better />/ suggestion) for people />/ who actually want 3-letter codes. / Did this happen (I can't find it)? I'll try extending WordTokenization to do this unless someone has already done it or can advise me better (I'm new here and advice would be very welcome). Cheers, Neil Bacon _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
