[Biojava-l] RE: [Biojava-dev] PhredFormat

Schreiber, Mark Sun, 30 Nov 2003 15:25:30 -0800

Hi Frans -

Thanks for these changes. I have committed them to cvs and added "default" as a valid 
tokenization of IntegerAlphabet (as a synonym of "token").

- Mark

-----Original Message-----
From: VERHOEF Frans [mailto:[EMAIL PROTECTED] 
Sent: Friday, 28 November 2003 4:34 p.m.
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: [Biojava-dev] PhredFormat

Hi,

I have fixed the little bugs in PhredFormat bugging me for the last 2 days. I have 
attached the version fixed by me. Feel free to use it, change it or throw it.
In short what I have changed is this:

-          PhredFormat implements ParseErrorSource and ParseErrorListener. This was 
not much of a job, as I basically copied it from FastaFormat.
-          readSequenceData(BufferedReader br, SymbolTokenization parser, 
SeqIOListener listener) has changed. This method used to parse char arrays for short 
number strings and feed it to the StreamParser, which in turn would try to do the 
same. As in the process the whitespaces were removed, in the end a String representing 
a humongous number was tried to be parsed to integer. Now this method does not parse 
the char arrays, but just feeds whole chunks of char array to the StreamParser.

One new issue came up though, when I am trying to do the following:

            StreamReader qualityIter = PhredTools.readPhredQuality(new 
BufferedReader(new FileReader(phredQualityFile)));
            While (qualityIter.hasNext()){
                Sequence seq = qualityIter.nextSequence();
                String str = seq.seqString();
            }

The last line gave the following exception:

            java.util.NoSuchElementException: default parser not supported by 
IntegerAlphabet yet
            at 
org.biojava.bio.symbol.IntegerAlphabet.getTokenization(IntegerAlphabet.java:216)
            at 
org.biojava.bio.symbol.AbstractSymbolList.seqString(AbstractSymbolList.java:101)
            at 
org.biojava.bio.seq.impl.SimpleSequence.seqString(SimpleSequence.java:108)
            at 
org.gis.server.pipeline.apps.SequenceInfoParser.parseResults(SequenceInfoParser.java:82)

What happens is that SimpleSequence calls the AbstractSymbolList.seqString() method. 
This method in turn executes getAlphabet().getTokenization("default"), where 
getAlphabet returns the IntegerAlphabet. But IntegerAlphabet throws the Exception 
here, because it only except a name parameter value "token" and not the "default" that 
AbstractSymbolList gives. I do have simple workaround, that basically where the method 
IntegerAplhabet.getTokenization(String name) accepts both "default" and "token". 
But I am not sure I here understand the philosophy behind the design completely...

Kind regards,

Frans Verhoef
Bioinformatics Specialist
Genome Institute of Singapore
Genome, #02-01, 60 Biopolis Street, Singapore 138672
Tel: +65 6478 8000
DID: +65 6478 8060
HP: +65 9848 4325
Email: [EMAIL PROTECTED]

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

[Biojava-l] RE: [Biojava-dev] PhredFormat

Reply via email to