Interesting read. There are two sections worthy of comment: >NCBI is not proposing a new data model, but is simply transliterating >the data model we have used for the last decade into a different language for the >convenience of our users. ASN.1 has a number of specific data types such as INTEGER >or REAL numbers while XML has only strings, so our DTD automatically adds some >ENTITY definitions at the top which maps these numbers to strings. This mapping only >allows humans that read the DTD to see where numbers are expected; an XML validator >will not care what is there.
Use of an XML Schema would allow the enforcement of data types. >Summary: >While the effect of Roles, Scope, and Alternate Forms results in extensive >tags in the XML, it does accurately reflect the structure and use of the data. It allows >XML programs to capture as little or as much of the full data structure as they wish. I guess I fail to see the point of all this. How would a structure resulting from the suggestions that I propose be "lossy" in any way? Stephen Bobick -----Original Message----- From: Michael E. Smoot [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2003 2:37 PM To: Bobick, Stephen Cc: [EMAIL PROTECTED] Subject: Re: BLAST DTD (was RE: [Biojava-l] SeqSimilaritySearchSubHit - Strand information) This page explains how the DTD's were created: http://www.ncbi.nlm.nih.gov/IEB/ToolBox/XML/ncbixml.txt The short version is that the DTD's are transliterations of their ASN.1 data models. Mike _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
