No, I use BioJava to write the user's query sequence as a fasta file before feeding it to BLAST. I just copied a differently formatted sequence into my post.
Thanks. -Eric Trull --- [EMAIL PROTECTED] wrote: > Not exactly sure what the problem is here but it looks like your input is > not in FASTA format so that might be causing a problem?? > > > > > > "W. Eric Trull" <[EMAIL PROTECTED]> > Sent by: [EMAIL PROTECTED] > 12/13/2005 08:22 AM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] SAXException with BLAST errors > > > Hello all, > > Some of you may remember that I've been creating a Java application to > front > a BLAST web service. Everything is working great except some user found > the > random sequence that causes problems (gotta love those users). I'm using > the > BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output. I think I > have > two problems; one is a NCBI BLAST problem and the other is with BioJava's > BlastXMLParserFacade. Any help/advice would be appreciated, especially if > I > have to explain the problem to NCBI - biology is not my strong suit. > > Here is the relevant BioJava stack trace: > > org.xml.sax.SAXException: <Hsp> is non-compliant. > at > org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362) > at > org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235) > at > org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153) > at > org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403) > at > org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456) > at > org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260) > at > org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381) > at > org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081) > at > org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180) > > Here is STDERR from NCBI BLAST on Sun Solaris: > > [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) > >= > len(256) > [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) > >= > len(256) > [blastall] ERROR: [065.106] : /var/tmp/blast39961.tmpOutput > BlastOutput.iterations.E.hits.E.hsps.E.<hseq> > Invalid value(s) [-3] in VisibleString > [ýýýýýýýýýýýýýýýýý----------ýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýý > ...] > > Here is what I get from NCBI BLAST on Windows XP: > > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > start(263) > >= > len(256) > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > start(263) > >= > len(256) > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > start(280) > >= > len(256) > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > start(313) > >= > len(256) > > Here is how I started BLAST: > > /home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p blastp > -d > /home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7 > -o > /var/tmp/blast39961.tmp -b 0 > > Here is my input sequence: > > MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV > GAAPHPFLHR > YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI > NGSNWEGILG > LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID > HSLYTGSLWY > TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA > ASSTEKFPDG > FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY > KFAISQSSTG > TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME > DCGYN > > Here is the regular BLAST output for pdb|1ML5|E. It seems odd to me that > the > identities and positives are both zero - why is this even showing up as a > similar sequence? > > >pdb|1ML5|E 30S Ribosomal Protein S2 > Length = 256 > > Score = 28.1 bits (61), Expect = 5.8 > Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%) > > Query: 99 ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD > 158 > > Sbjct: 264 ---------- 313 > > Query: 159 SLVKQTHVPNL 169 > > Sbjct: 314 324 > > > Here is the XML BLAST output for pdb|1ML5|E. Notice the second <Hsp_hseq> > has a bunch of "#" signs. Is this valid in BioJava? > > <Hit> > <Hit_num>146</Hit_num> > <Hit_id>pdb|1ML5|E</Hit_id> > <Hit_def>30S Ribosomal Protein S2</Hit_def> > <Hit_accession>1ML5_E</Hit_accession> > <Hit_len>256</Hit_len> > <Hit_hsps> > <Hsp> > <Hsp_num>1</Hsp_num> > <Hsp_bit-score>28.1054</Hsp_bit-score> > <Hsp_score>61</Hsp_score> > <Hsp_evalue>5.76848</Hsp_evalue> > <Hsp_query-from>99</Hsp_query-from> > <Hsp_query-to>169</Hsp_query-to> > <Hsp_hit-from>264</Hsp_hit-from> > <Hsp_hit-to>324</Hsp_hit-to> > <Hsp_query-frame>1</Hsp_query-frame> > <Hsp_hit-frame>1</Hsp_hit-frame> > <Hsp_gaps>10</Hsp_gaps> > <Hsp_align-len>71</Hsp_align-len> > > <Hsp_qseq>ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNL</Hsp_qseq> > > <Hsp_hseq>#################----------############################################</Hsp_hseq> > <Hsp_midline> > </Hsp_midline> > </Hsp> > </Hit_hsps> > </Hit> > > Thanks. > > -Eric Trull > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > Thanks. -W. Eric Trull _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l