Thanks for the suggestion Mark. I emailed NCBI and the jist of the reply was:
These SeqPortNew errors usually indicate a problem in the formatting process; the #'s are certainly not normal. Is this the only database entry that generates errors? So I dug a little deeper on 1ML5 to discover that it has a chain 'e' and a chain 'E'. When I created my FASTA file to feed to formatdb I made the deflines of the form pdb|<id>|<chain>, but in uppercase. So I had two entries with the same defline but different sequences. I think this is my problem and am working on fixing it now. Thanks. -Eric Trull --- [EMAIL PROTECTED] wrote: > I would send NCBI your test sequence, the blast output and the version of > BLAST and ask them if this is "normal". I have found them to be very > responsive in the past. If it is normal then we need to fix biojava to > cope. > > - Mark > > > > > > "W. Eric Trull" <[EMAIL PROTECTED]> > 12/13/2005 09:42 AM > > > To: Mark Schreiber/GP/[EMAIL PROTECTED] > cc: biojava-l@biojava.org, > [EMAIL PROTECTED] > Subject: Re: [Biojava-l] SAXException with BLAST errors > > > No, I use BioJava to write the user's query sequence as a fasta file > before > feeding it to BLAST. I just copied a differently formatted sequence into > my > post. > > Thanks. > > -Eric Trull > > --- [EMAIL PROTECTED] wrote: > > > Not exactly sure what the problem is here but it looks like your input > is > > not in FASTA format so that might be causing a problem?? > > > > > > > > > > > > "W. Eric Trull" <[EMAIL PROTECTED]> > > Sent by: [EMAIL PROTECTED] > > 12/13/2005 08:22 AM > > > > > > To: biojava-l@biojava.org > > cc: (bcc: Mark Schreiber/GP/Novartis) > > Subject: [Biojava-l] SAXException with BLAST errors > > > > > > Hello all, > > > > Some of you may remember that I've been creating a Java application to > > front > > a BLAST web service. Everything is working great except some user found > > > the > > random sequence that causes problems (gotta love those users). I'm > using > > the > > BlastXMLParserFacade to parse NCBI BLAST (2.2.12) XML output. I think I > > > have > > two problems; one is a NCBI BLAST problem and the other is with > BioJava's > > BlastXMLParserFacade. Any help/advice would be appreciated, especially > if > > I > > have to explain the problem to NCBI - biology is not my strong suit. > > > > Here is the relevant BioJava stack trace: > > > > org.xml.sax.SAXException: <Hsp> is non-compliant. > > at > > > org.biojava.bio.program.sax.blastxml.HspHandler.endElementHandler(HspHandler.java:362) > > at > > > org.biojava.bio.program.sax.blastxml.StAXFeatureHandler.endElement(StAXFeatureHandler.java:235) > > at > > > org.biojava.utils.stax.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor.java:153) > > at > > org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403) > > at > > > org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1456) > > at > > > org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1260) > > at > > > org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381) > > at > > org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081) > > at > > > org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:180) > > > > Here is STDERR from NCBI BLAST on Sun Solaris: > > > > [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) > > > >= > > len(256) > > [blastall] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E start(263) > > > >= > > len(256) > > [blastall] ERROR: [065.106] : /var/tmp/blast39961.tmpOutput > > BlastOutput.iterations.E.hits.E.hsps.E.<hseq> > > Invalid value(s) [-3] in VisibleString > > [ýýýýýýýýýýýýýýýýý----------ýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýýý > > > ...] > > > > Here is what I get from NCBI BLAST on Windows XP: > > > > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > > start(263) > > >= > > len(256) > > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > > start(263) > > >= > > len(256) > > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > > start(280) > > >= > > len(256) > > [NULL_Caption] ERROR: ncbiapi [000.000] : SeqPortNew: pdb|1ML5|E > > start(313) > > >= > > len(256) > > > > Here is how I started BLAST: > > > > /home/etrull/developer/blast-sparc64-solaris-2.2.12/bin/blastall -p > blastp > > -d > > /home/etrull/developer/blast/current/pdb -i /var/tmp/fasta39960.tmp -m 7 > > > -o > > /var/tmp/blast39961.tmp -b 0 > > > > Here is my input sequence: > > > > MLPRETDEEP EEPGRRGSFV EMVDNLRGKS GQGYYVEMTV GSPPQTLNIL VDTGSSNFAV > > GAAPHPFLHR > > YYQRQLSSTY RDLRKGVYVP YTQGAWAGEL GTDLVSIPHG PNVTVRANIA AITESDKFFI > > NGSNWEGILG > > LAYAEIARPD DSLEPFFDSL VKQTHVPNLF SLQLCGAGFP LNQSEVLASV GGSMIIGGID > > HSLYTGSLWY > > TPIRREWYYE VIIVRVEING QDLKMDCKEY NYDKSIVDSG TTNLRLPKKV FEAAVKSIKA > > ASSTEKFPDG > > FWLGEQLVCW QAGTTPWNIF PVISLYLMGE VTNQSFRITI LPQQYLRPVE DVATSQDDCY > > KFAISQSSTG > > TVMGAVIMEG FYVVFDRARK RIGFAVSACH VHDEFRTAAV EGPFVTLDME > > DCGYN > > > > Here is the regular BLAST output for pdb|1ML5|E. It seems odd to me > that > > the > > identities and positives are both zero - why is this even showing up as > a > > similar sequence? > > > > >pdb|1ML5|E 30S Ribosomal Protein S2 > > Length = 256 > > > > Score = 28.1 bits (61), Expect = 5.8 > > Identities = 0/71 (0%), Positives = 0/71 (0%), Gaps = 10/71 (14%) > > > > Query: 99 ELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFD > > 158 > > > > Sbjct: 264 ---------- 313 > > > > Query: 159 SLVKQTHVPNL 169 > > > > Sbjct: 314 324 > > > > > > Here is the XML BLAST output for pdb|1ML5|E. Notice the second > <Hsp_hseq> > > has a bunch of "#" signs. Is this valid in BioJava? > > > > <Hit> > > <Hit_num>146</Hit_num> > > <Hit_id>pdb|1ML5|E</Hit_id> > > <Hit_def>30S Ribosomal Protein S2</Hit_def> > > <Hit_accession>1ML5_E</Hit_accession> > > <Hit_len>256</Hit_len> > > <Hit_hsps> > > <Hsp> > > <Hsp_num>1</Hsp_num> > > <Hsp_bit-score>28.1054</Hsp_bit-score> > > <Hsp_score>61</Hsp_score> > > <Hsp_evalue>5.76848</Hsp_evalue> > > <Hsp_query-from>99</Hsp_query-from> > === message truncated === _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l