BioJava's BLAST framework parses files and fires events for every piece of information it finds. The SeqSimilarityAdapter class is an example of how to catch these events and construct basic BLAST result objects (SimpleSeqSimilarityHit), however they are not comprehensive and do not record full details of every hit.
If you want the kind of detail you mention below you will have to write your own content handler for BLAST parsing and parse it to the BLASTLikeSAXParser when parsing a file. This event handler should implement the ContentHandler interface. Look at the source of SeqSimilarityAdapter for guidance. You will then receive events for every part of the file, from which you can construct your own custom BLAST result objects to describe them. If you're not sure what tag names to listen for in your ContentHandler the easiest thing to do is just run it once and dump them all out to see what you get. cheers, Richard -----Original Message----- From: [EMAIL PROTECTED] on behalf of Y D Sun Sent: Sun 6/26/2005 5:42 PM To: biojava-l@biojava.org Cc: Subject: [Biojava-l] BLAST Parser for extracting all BLAST data? Hi, I want to extract all data from BLASTP results. In the following hit, for example, I need to get the lengths of query and subject proteins, the identities (including all data 54, 124 and 43%), the positives (all data 79, 124 and 63%), and the gaps (3, 124 and 2%). Can the BLASTLikeSAXParser filter all these information? I can't find the methods in SeqSimilaritySearchHit and SeqSimilaritySearchSubHit APIs to retrieve these data. Does Biojava provide any methods for this purpose? Thanks, George BLASTP 2.2.5 [Nov-16-2002] Query= Prot0001 (138 letters) Database: /work/nys1/fasta/protein/AE000782.pro.fasta 2407 sequences; 662,866 total letters Searching.....done Score E Sequences producing significant alignments: (bits) Value Prot0002 100 1e-23 Prot0003 74 2e-15 Prot0004 43 3e-06 >Prot0002 Length = 138 Score = 100 bits (250), Expect = 1e-23 Identities = 54/124 (43%), Positives = 79/124 (63%), Gaps = 3/124 (2%) Query: 18 NARTKFTDIAKTLNLTEAAIRKRIKKLEENQIIKRYSIDIDYKKLGYNMAIIGLDIDMDY 77 NAR T IAK LN+TEAA+RKRI LE + I Y I+YKK+G + ++ G+D+D D Sbjct: 15 NARIPKTRIAKELNVTEAAVRKRIANLERREEILGYKAIINYKKVGLSASLTGVDVDPDK 74 Query: 78 FPKIIKELEKRKEFLHIYSSAGDHDIMVIAIYK---DLEEIYNYLKNLKGVKRVCPAIII 134 K+++EL+ + ++ + GDH IM I K +L EI+ + ++GVKRVCP+II Sbjct: 75 LWKVVEELKDLESVKSLWLTTGDHTIMAEIIAKSVQELSEIHQKIAEMEGVKRVCPSIIT 134 Query: 135 DQIK 138 D +K Sbjct: 135 DIVK 138 _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l