Greetings, I'm afraid I will not be answering the poster here, but the message caught my curiousity and prompted me to take a peek at the BLAST DTD, and subsequently post this commentary. My question is how was the BLAST DTD designed and under what standards? I find the choice of element names to be unfortunate. In comparing to standard XML naming and DTD design I would expect something like:
<hsp_query from="576" to="229" frame="1"/> Rather than the following: <Hsp_query-from>576</Hsp_query-from> <Hsp_query-to>229</Hsp_query-to> <Hsp_query-frame>1</Hsp_query-frame> The two primary differences are in capitalization, and the choice attributes rather than separate elements for each datum in this excerpt. As a consequence, the "expected" form is more succinct. From the DTD I see the latter naming and element/attribute choice is repeated many times. I will add an admission that I have not worked with BLAST results in several years, as my focus has been on data management software (LIMS) and, more recently, analysis software. Still, as a professional in the greater bioinformatics community, who works daily with XML, I do like to see an incorporation of good practices from the "pure" software development community. Comments? Stephen Bobick -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jan W�rthner Sent: Tuesday, December 02, 2003 12:52 AM To: [EMAIL PROTECTED] Subject: [Biojava-l] SeqSimilaritySearchSubHit - Strand information Hi folks, I'm constructing SeqSimilaritySearchSubHit instances from xml formatted NCBI BLAST results, and I'm getting steadily confused with the query's and subject's from and to information on one hand and the query's and subject's strand on the other hand. The NCBI returns for example: <Hsp_query-from>576</Hsp_query-from> <Hsp_query-to>229</Hsp_query-to> <Hsp_query-frame>1</Hsp_query-frame> <Hsp_hit-from>12374053</Hsp_hit-from> <Hsp_hit-to>12374401</Hsp_hit-to> <Hsp_hit-frame> -1</Hsp_hit-frame> I'd think that the possibility to assign the from- and to-values in different orders (like descending in this query) already includes the information about the direction (POSITIVE/NEGATIVE). Why is there an additional "frame" value, and why is the query's frame value set to +1, and the subject's (=hit's) value set to -1? I assumed it to be assigned vice versa. My question is: How shall I set the SeqSimilaritySearchSubHit instance's query/subject values from these data? Having answered this will be of much help! Thank you Jan -- Jan W�rthner Institute for Medical Microbiology Building 22.21 Heinrich-Heine-University Universit�tsstra�e 1 40225 Duesseldorf Tel. +49 (0) 211 81 12461 URL: www.medmikro.uni-duesseldorf.de _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
