BioJava's BLAST framework parses files and fires events for every piece of 
information it finds. The SeqSimilarityAdapter class is an example of how to 
catch these events and construct basic BLAST result objects 
(SimpleSeqSimilarityHit), however they are not comprehensive and do not record 
full details of every hit.

If you want the kind of detail you mention below you will have to write your 
own content handler for BLAST parsing and parse it to the BLASTLikeSAXParser 
when parsing a file. This event handler should implement the ContentHandler 
interface. Look at the source of SeqSimilarityAdapter for guidance. You will 
then receive events for every part of the file, from which you can construct 
your own custom BLAST result objects to describe them.

If you're not sure what tag names to listen for in your ContentHandler the 
easiest thing to do is just run it once and dump them all out to see what you 
get.

cheers,
Richard


-----Original Message-----
From:   [EMAIL PROTECTED] on behalf of Y D Sun
Sent:   Sun 6/26/2005 5:42 PM
To:     biojava-l@biojava.org
Cc:     
Subject:        [Biojava-l] BLAST Parser for extracting all BLAST data?

Hi,

I want to extract all data from BLASTP results. In the following hit,
for example, I need to get the lengths of query and subject proteins,
the identities (including all data 54, 124 and 43%), the positives (all
data 79, 124 and 63%), and the gaps (3, 124 and 2%). Can the
BLASTLikeSAXParser filter all these information? I can't find the
methods in SeqSimilaritySearchHit and SeqSimilaritySearchSubHit APIs to
retrieve these data. Does Biojava provide any methods for this purpose?

Thanks,

George


BLASTP 2.2.5 [Nov-16-2002]

Query= Prot0001
         (138 letters)

Database: /work/nys1/fasta/protein/AE000782.pro.fasta
           2407 sequences; 662,866 total letters

Searching.....done

                                                                 Score
E
Sequences producing significant alignments:                      (bits)
Value

Prot0002                                                           100
1e-23
Prot0003                                                            74
2e-15
Prot0004                                                            43
3e-06

>Prot0002
          Length = 138

 Score =  100 bits (250), Expect = 1e-23
 Identities = 54/124 (43%), Positives = 79/124 (63%), Gaps = 3/124 (2%)

Query: 18  NARTKFTDIAKTLNLTEAAIRKRIKKLEENQIIKRYSIDIDYKKLGYNMAIIGLDIDMDY
77
           NAR   T IAK LN+TEAA+RKRI  LE  + I  Y   I+YKK+G + ++ G+D+D D
Sbjct: 15  NARIPKTRIAKELNVTEAAVRKRIANLERREEILGYKAIINYKKVGLSASLTGVDVDPDK
74

Query: 78  FPKIIKELEKRKEFLHIYSSAGDHDIMVIAIYK---DLEEIYNYLKNLKGVKRVCPAIII
134
             K+++EL+  +    ++ + GDH IM   I K   +L EI+  +  ++GVKRVCP+II
Sbjct: 75  LWKVVEELKDLESVKSLWLTTGDHTIMAEIIAKSVQELSEIHQKIAEMEGVKRVCPSIIT
134

Query: 135 DQIK 138
           D +K
Sbjct: 135 DIVK 138

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l




_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to