Hi Ewan, I know of Mart (and I like it) but it is not suited for automated sequence retrieval using gene_stable_id's (a SOAP web service for the export data function would be nice). Anyway, the Mart output would have currently the same faults I guess. Do you reckon that the fixing of the Ensembl bugs is a short term matter? No ideas on the cause of the 3rd problem? I would probably have to print the stack trace in the source code instead of the message "could not be parsed" when parsing errors occur.
Thx, Stein. PS it is very annoying that my mails are always bounced because of a 'suspicious header'; am I doing something wrong? Ewan Birney wrote: On Wed, 29 Jan 2003, Stein Aerts wrote: Hi, When currently parsing an exported sequence of an Ensembl mouse gene (using the Export Data function at www.ensembl.org) there appear to be 3 problems: I tried to attach an example of an exported sequence of the Igf1 gene but then the message was bounced because of a suspicious header... 1. Some of the exon locations start with .0: I think this is a bug of the EMBL formatting at Ensembl? Yes, this is pretty certainly a fault our end, and I think I know where this is. FT exon .0:44020..44364 FT /exon_id="ENSMUSE00000233709" FT /start_phase=0 FT /end_phase=0 2. The first annotation of a CDS feature is written on the next line after CDS. This is not found by the EMBL parser. I think that is is also a bug at Ensembl? This is probably a line-length issue. I wonder what the right thing to do here is... Hmmm FT CDS FT /gene="ENSMUSG00000020053" 3. Some of the lines cannot be parsed, for example the parser writes to System.out: "This line could not be parsed: exon 2001..2159" This one I don't understand, I cannot see a problem for these features? FT exon 2001..2159 FT /exon_id="ENSMUSE00000248454" FT /start_phase=0 FT /end_phase=0 Thank you in advance! Stein - have you tried Mart inside Ensembl? For most people, this is far easier way to get bulk downloads of stuff in very-easy-to-parse-format. http://www.ensembl.org/Homo_sapiens/martview choose feature list and/or gene structure when you get to output. The Ensembl bugs should be fixed of course... ;) Stein. -- Stein Aerts BioI@SISTA K.U.Leuven ESAT-SCD Belgium http://www.esat.kuleuven.ac.be/~dna/BioI _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <[EMAIL PROTECTED]>. ----------------------------------------------------------------- _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l -- Stein Aerts BioI@SISTA K.U.Leuven ESAT-SCD Belgium http://www.esat.kuleuven.ac.be/~dna/BioI -- Stein Aerts BioI@SISTA K.U.Leuven ESAT-SCD Belgium http://www.esat.kuleuven.ac.be/~dna/BioI _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l