I am working on bio.program.PhredSequence and its friends (for handling the qualitative data associated with the output of Phred). PhredSequence uses SymbolLists with an IntegerAlphabet. At present the getToken() method of IntergerAlphabet.IntegerSymbol returns '#'. I guess this is because the Symbol interface specifies that getToken() return a char. Shouldn't this be a String? Afterall SymbolParser parseToken() parses a String, and aren't we dealing with alphabets that can have multi-character tokens such as the 3 letter amino acids names? Has this issue come up before? Am I misunderstanding 'token'?
One of the things that must be done with at PhredSequnece is to write the quality data (an IntegerAlphabet based SymbolList) to a fasta-like format. I'd like to just create a Sequence with the quality SymbolList and be able to write this using a FastaFormat. But since FastaFormat calls seqString() and that is coded in AbstractSymbolList to use getToken() it can only deal with chars so it can't handle IntegerSymbols. Another is issue is that with an IntegerSymbolList one would really like the seqString to output something like '10 20 22 7' as opposed to '1020227'. Three options: 1) Create a new SequenceFormat just for this, and if there will be no other use of IntegerSymbolList perhaps this is the best way to go. 2) Create an IntegerSymbolList that extends SimpleSymbolList overriding seqString(). 3) (most invasive but perhaps cleanest) Change getToken() to return an String, or adding toString() to Symbol and add a method paddedSeqString() to AbstractSymbolList. Preferences, suggestions? David ||||||||||||||||||||||||||||||||||||||||||||||||||||||| | David Waring | Systems Programmer | University of Washington Genome Center | [EMAIL PROTECTED] | (206) 221-6902 ||||||||||||||||||||||||||||||||||||||||||||||||||||||| _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l