Not exactly sure what the problem is here but it looks like your input is not in FASTA format so that might be causing a problem??
"W. Eric Trull" <[EMAIL PROTECTED]> Sent by: [EMAIL PROTECTED] 12/13/2005 08:22 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] SAXException with BLAST errors Hello all, Some of you may remember that I've been creating a Java application to front a BLAST web service. Everything is working great except some user found the random sequence that causes problems (gotta love those users). I'm using the BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output. I think I have two problems; one is a NCBI BLAST problem and the other is with BioJava's BlastXMLParserFacade. Any help/advice would be appreciated, especially if I have to explain the problem to NCBI - biology is not my strong suit. Here is the relevant BioJava stack trace: org.xml.sax.SAXException: <Hsp> is non-compliant. at org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362) at org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235) at org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153) at org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403) at org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456) at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260) at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381) at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081) at org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180) Here is STDERR from NCBI BLAST on Sun Solaris: [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) >= len(256) [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) >= len(256) [blastall] ERROR: [065.106] : /var/tmp/blast39961.tmpOutput BlastOutput.iterations.E.hits.E.hsps.E.<hseq> Invalid value(s) [-3] in VisibleString [ýýýýýýýýýýýýýýýýý----------ýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýý ...] Here is what I get from NCBI BLAST on Windows XP: [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) >= len(256) [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) >= len(256) [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(280) >= len(256) [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(313) >= len(256) Here is how I started BLAST: /home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p blastp -d /home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7 -o /var/tmp/blast39961.tmp -b 0 Here is my input sequence: MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV GAAPHPFLHR YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI NGSNWEGILG LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID HSLYTGSLWY TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA ASSTEKFPDG FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY KFAISQSSTG TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME DCGYN Here is the regular BLAST output for pdb|1ML5|E. It seems odd to me that the identities and positives are both zero - why is this even showing up as a similar sequence? >pdb|1ML5|E 30S Ribosomal Protein S2 Length = 256 Score = 28.1 bits (61), Expect = 5.8 Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%) Query: 99 ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD 158 Sbjct: 264 ---------- 313 Query: 159 SLVKQTHVPNL 169 Sbjct: 314 324 Here is the XML BLAST output for pdb|1ML5|E. Notice the second <Hsp_hseq> has a bunch of "#" signs. Is this valid in BioJava? <Hit> <Hit_num>146</Hit_num> <Hit_id>pdb|1ML5|E</Hit_id> <Hit_def>30S Ribosomal Protein S2</Hit_def> <Hit_accession>1ML5_E</Hit_accession> <Hit_len>256</Hit_len> <Hit_hsps> <Hsp> <Hsp_num>1</Hsp_num> <Hsp_bit-score>28.1054</Hsp_bit-score> <Hsp_score>61</Hsp_score> <Hsp_evalue>5.76848</Hsp_evalue> <Hsp_query-from>99</Hsp_query-from> <Hsp_query-to>169</Hsp_query-to> <Hsp_hit-from>264</Hsp_hit-from> <Hsp_hit-to>324</Hsp_hit-to> <Hsp_query-frame>1</Hsp_query-frame> <Hsp_hit-frame>1</Hsp_hit-frame> <Hsp_gaps>10</Hsp_gaps> <Hsp_align-len>71</Hsp_align-len> <Hsp_qseq>ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNL</Hsp_qseq> <Hsp_hseq>#################----------############################################</Hsp_hseq> <Hsp_midline> </Hsp_midline> </Hsp> </Hit_hsps> </Hit> Thanks. -Eric Trull _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l