Hi!

I am currently evaluating the XML output of NCBI Blast, and the ability of BioJava to parse this output. For this purpose, I have done twice the identical blastp and blastn (i.e. the same sequence against the same database with the same parameters), one time with the standard output, and one time with XML output ("-m 7"). I then parsed the files either with BlastLikeSAXParser (original output), or with BlastXMLParserFacade (XML output) and compared the outcome. Surprisingly, I got two different results...

Here is a list of the fields that are different:

SeqSimilaritySearchResult:
  Annotation:
    databaseId
    program
    queryId
    version

SeqSimilaritySearchHit:
  subjectId
  queryStrand
  subjectStrand
  Annotation:
    subjectDescription
    subjectId


SeqSimilaritySearchSubHit: queryStrand subjectStrand score numberOfIdentities numberOfPositives percentageIdentity score

These are all rather important fields, for example subjectId, the description or score. After looking at it, I think that the output of BlastLikeSAXParser is OK, but the one of BlastXMLParserFacade is rotten.

What now? I think that the parsing results are supposed to be identical (as good as it gets), but changing the parser might break existing code. If it's OK for you, I'd like to volunteer to change BlastXMLParserFacade so that the outcome resembles more the one of BlastLikeSAXParser.

By the way, is there a guaranteed set of Annotation entries for these different classes? For example, I find percentageIdentity, but no percentagePositives.

Greetings,
Christian
_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to