Hi,

 

I have fixed the little bugs in PhredFormat bugging me for the last 2 days. I have attached the version fixed by me. Feel free to use it, change it or throw it.

In short what I have changed is this:

 

-          PhredFormat implements ParseErrorSource and ParseErrorListener. This was not much of a job, as I basically copied it from FastaFormat.

-          readSequenceData(BufferedReader br, SymbolTokenization parser, SeqIOListener listener) has changed. This method used to parse char arrays for short number strings and feed it to the StreamParser, which in turn would try to do the same. As in the process the whitespaces were removed, in the end a String representing a humongous number was tried to be parsed to integer. Now this method does not parse the char arrays, but just feeds whole chunks of char array to the StreamParser.

 

One new issue came up though, when I am trying to do the following:

 

            StreamReader qualityIter = PhredTools.readPhredQuality(new BufferedReader(new FileReader(phredQualityFile)));

            While (qualityIter.hasNext()){

                Sequence seq = qualityIter.nextSequence();

                String str = seq.seqString();

            }

 

The last line gave the following exception:

 

            java.util.NoSuchElementException: default parser not supported by IntegerAlphabet yet

            at org.biojava.bio.symbol.IntegerAlphabet.getTokenization(IntegerAlphabet.java:216)

            at org.biojava.bio.symbol.AbstractSymbolList.seqString(AbstractSymbolList.java:101)

            at org.biojava.bio.seq.impl.SimpleSequence.seqString(SimpleSequence.java:108)

            at org.gis.server.pipeline.apps.SequenceInfoParser.parseResults(SequenceInfoParser.java:82)

 

What happens is that SimpleSequence calls the AbstractSymbolList.seqString() method. This method in turn executes getAlphabet().getTokenization(“default”), where getAlphabet returns the IntegerAlphabet. But IntegerAlphabet throws the Exception here, because it only except a name parameter value “token” and not the “default” that AbstractSymbolList gives. I do have simple workaround, that basically where the method IntegerAplhabet.getTokenization(String name) accepts both “default” and “token”.

But I am not sure I here understand the philosophy behind the design completely…

 

Kind regards,

 

Frans Verhoef

Bioinformatics Specialist

Genome Institute of Singapore

Genome, #02-01, 60 Biopolis Street, Singapore 138672

Tel: +65 6478 8000

DID: +65 6478 8060

HP: +65 9848 4325

Email: [EMAIL PROTECTED]

 

Attachment: PhredFormat.java
Description: PhredFormat.java

_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to