Although we have had similiar problems, I'd like to know what information you need is 
lost exporting to the mentioned file formats? For the most part you can recover what 
you need. Esp if you REALLY mean you don't want to be held to biojava/java on either 
end of the process. I'd just hate to create YAF (Yet another format), instead of 
modifying/using one that already exists, and creating extra work trying to make it 
"not biojava bound" yet "containing biojava info"
-Robin

        -----Original Message----- 
        From: Schreiber, Mark [mailto:[EMAIL PROTECTED]] 
        Sent: Sun 5/5/2002 9:05 PM 
        To: [EMAIL PROTECTED] 
        Cc: 
        Subject: [Biojava-l] Biojava XML Binding (BJXB)
        
        

        Hi -
        
        I would like to propose/ formalise a schema for binding biojava objects
        esp sequence objects to XML. The current binding of Biojava objects to
        other formats such as GFF, GenBank, EMBL, Game, Agave is inadequate as
        details are lost in the reading and writing of these objects. While it
        is useful for biojava to read and write these objects the only way to
        currently capture everything about a biojava is to serialize it as a
        binary stream. The advantage of serializing to an XML document is that
        the XML can be constructed and edited using a text editor or programatic
        processes on a machine (possibly a legacy system) with no Biojava
        installation and no requirement for a JVM. Also the XML can be ported
        via HTTP/ Soap. The DTD could also be used as a base for anyone who
        needs a richer schema that maps well to Biojava.
        
        Why not use JAXB? Two reasons, JAXB requires java at both ends of the
        serialization / deserialization proceedure. JAXB doesn't play well with
        many biojava objetcs due to their use of factory methods, private and
        protected constructors and singleton Alphabets. Actually this was all
        inspired by my inability to get JAXB to work with biojava.
        
        I have included a demo xml file and a simple dtd. Obviously there is a
        lot of room for expansion of the DTD to include more biojava concepts
        however I thought I would start with a typical use with a rather nasty
        feature structure. Currently there is no read or write ability but StAX
        looks like an obvious choice, I suspect there might be a need for a lot
        of reflection code in the handlers! I am no StAX expert so if someone
        feels particularly inspired in the next 24hours to knock out a quick
        handler that would be cool.
        
        Comments and Flames welcome.
        
        <?xml version="1.0" encoding="UTF-8"?>
        <!DOCTYPE seq_db SYSTEM "bjxb.dtd">
        
        <seq_db class="org.biojava.bio.seq.db.HashSequenceDB">
          <sequence class="org.biojava.bio.seq.impl.SimpleSequence">
            <id name="fooase_est" urn="embl:UA000933"/>
            <symbol_list class="org.biojava.bio.seq.SimpleSymbolList"
        alphabet="DNA">
        
        accggtatgaccagaggacccatatagggacaaaccaaaaaaaaagcccacagcgcgttgagacagg
              gggacacacccatatttaagaggacaccaaccccccccaaagagagagatnaaaaanaaana
            </symbol_list>
            <annotation class="org.biojava.bio.SimpleAnnotation">
              <entry key="organism" value="Homo Sapiens"/>
              <entry key="seq_type" value="EST"/>
              <entry key="date" value="19/11/2001"/>
            </annotation>
            <feature_holder>
              <feature class="org.biojava.bio.seq.genomic.TranslatedRegion"
                       source="auto translation"
                       type="predicted peptide">
                <annotation class="org.biojava.bio.Annotation.EmptyAnnotation"/>
                <location value="[7..28]"/>
                <sequence class="org.biojava.bio.seq.impl.SimpleSequence">
                  <id name="fooase"/>
                    <symbol_list class="org.biojava.bio.seq.SimpleSymbolList"
        alphabet="PROTEIN">
                      MTRGPI*
                    </symbol_list>
                    <annotation
        class="org.biojava.bio.Annotation.EmptyAnnotation"/>
                </sequence>
                <feature class="org.biojava.bio.seq.impl.SimpleFeature"
                         source="experimental evidence"
                         type="SNP">
                  <annotation class="org.biojava.bio.SimpleAnnotation">
                    <entry key="SNP_type" value="g:c"/>
                  </annotation>
                  <location value="14"/>
                </feature>
              </feature>
              <feature class="org.biojava.bio.seq.SimpleFeature"
                       source="experimental"
                       type="PolyA tail">
                 <annotation
        class="org.biojava.bio.Annotation.EmptyAnnotation"/>
                 <location value="[119..131]"/>
              </feature>
            </feature_holder>
          </sequence>
        </seq_db>
        
        <?xml version="1.0" encoding="UTF-8" ?>
        <!ELEMENT id EMPTY >
        <!ATTLIST id urn NMTOKEN #IMPLIED >
        <!ATTLIST id name NMTOKEN #REQUIRED >
        
        <!ELEMENT feature_holder ( feature* ) >
        
        <!ELEMENT annotation ( entry* ) >
        <!ATTLIST annotation class NMTOKEN #REQUIRED >
        
        <!ELEMENT sequence ( id, symbol_list, annotation, feature_holder? ) >
        <!ATTLIST sequence class NMTOKEN #REQUIRED >
        
        <!ELEMENT seq_db ( sequence+ ) >
        <!ATTLIST seq_db class NMTOKEN #REQUIRED >
        
        <!ELEMENT symbol_list ( #PCDATA ) >
        <!ATTLIST symbol_list class NMTOKEN #REQUIRED >
        <!ATTLIST symbol_list alphabet NMTOKEN #REQUIRED >
        
        <!ELEMENT location EMPTY >
        <!ATTLIST location value CDATA #REQUIRED >
        
        <!ELEMENT entry EMPTY >
        <!ATTLIST entry key NMTOKEN #REQUIRED >
        <!ATTLIST entry value CDATA #REQUIRED >
        
        <!ELEMENT feature ( annotation, location, sequence?, feature? ) >
        <!ATTLIST feature type CDATA #REQUIRED >
        <!ATTLIST feature source CDATA #REQUIRED >
        <!ATTLIST feature class NMTOKEN #REQUIRED >
        
        
        Mark Schreiber
        Bioinformatics
        AgResearch Invermay
        PO Box 50034
        Mosgiel
        New Zealand
        
        PH:   +64 3 489 9175
        FAX:  +64 3 489 3739
        
        
        =======================================================================
        Attention: The information contained in this message and/or attachments
        from AgResearch Limited is intended only for the persons or entities
        to which it is addressed and may contain confidential and/or privileged
        material. Any review, retransmission, dissemination or other use of, or
        taking of any action in reliance upon, this information by persons or
        entities other than the intended recipients is prohibited by AgResearch
        Limited. If you have received this message in error, please notify the
        sender immediately.
        =======================================================================
        _______________________________________________
        Biojava-l mailing list  -  [EMAIL PROTECTED]
        http://biojava.org/mailman/listinfo/biojava-l
        

_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to