-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Thanks for the reply, Mark. Setting the parser to be lazy (just before the parse; it shouldn't matter where I do this as long as it's prior to the parse, correct?) doesn't seem to help -- I still get the same SAX exception. The MegaBLAST output seems, to my eye, to be identical to that of blastn minus the header line:
MEGABLAST 2.2.10 [Oct-19-2004]
Looking at the code for BlastLikeSAXParser, it seems, even in lazy mode, to require that the header line contain at least a name with which it is familiar (lazy just turns off interest in the version number). Would a fix be as simple as adding 'MEGABLAST' to the list of acceptable names? I can provide any interested dev w/ a sample output file from the above-mentioned version of MegaBLAST.
If no one's interested, I'll follow up but it'll take me a lot longer than those already familiar w/ the BioJava parser code.
Thanks all, - -j
[EMAIL PROTECTED] wrote: | Hello - | | MegaBLAST is not offcially supported. This doesn't mean it won't work it | just means we don't know if it will work. If it isn't too different from | normal blast it probably will. | | The BlastLikeSAXParser has two modes. Lazy and Strict. If you call | setModeLazy() before parsing it won't care if it doesn't recognise the | format as one that is tried and tested and will attempt to parse it | anyway. You should carefully check a few results though to make sure it is | going well. If things work let us know so we can add MegaBLAST to the list | of trusted programs. | | Hope this helps, | | Mark | | | James Diggans <[EMAIL PROTECTED]> | Sent by: [EMAIL PROTECTED] | 11/22/2004 02:38 PM | | | To: BioJava <[EMAIL PROTECTED]> | cc: (bcc: Mark Schreiber/GP/Novartis) | Subject: [Biojava-l] Parsing MegaBLAST output files? | | | | | All, I'm attempting to use BioJava to parse the output from NCBI's | commandline MegaBLAST and receiving an error: | | 'Could not recognise the format of this file as one supported by the | framework.' | | in a SAXException thrown by BlastLikeSAXParser. An old post to the | mailing list: | | http://www.biojava.org/pipermail/biojava-dev/2002-October/000150.html | | seems to indicate that this was fixed long ago via this commit to CVS: | | http://cvs.biojava.org/cgi-bin/viewcvs/viewcvs.cgi/biojava-live/src/org/biojava/bio/program/ssbind/HeaderStAXHandler.java.diff?r1=1.3&r2=1.4&cvsroot=biojava | | The MegaBLAST file I'm trying to parse is clean and my attempt at a | parse consists of (largely pulled from the recipe from BioJava in Anger): | | ------------------ | InputStream is = new FileInputStream(blastResult); | | BlastLikeSAXParser parser = new BlastLikeSAXParser(); | SeqSimilarityAdapter adapter = new SeqSimilarityAdapter(); | parser.setContentHandler(adapter); | | alignmentResults = new ArrayList(); | SearchContentHandler builder = new | BlastLikeSearchBuilder(alignmentResults, | ~ new DummySequenceDB("queries"), | new DummySequenceDBInstallation()); | | adapter.setSearchContentHandler(builder); | | parser.parse(new InputSource(is)); | ------------------ | | Any ideas on why I'm getting the SAXException? Thanks ... | -j | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3-nr1 (Windows XP) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBosWy75jgGJzUhNkRAtL+AJ9V6JoMXSdT1AWPuFGMckUiMzFO5ACg2D1r 2R75Y4ElTIBxrMA+Pukgre0= =Is3P -----END PGP SIGNATURE----- _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l