[Biojava-l] Ensembl gene parsing

Stein Aerts Wed, 29 Jan 2003 01:01:58 -0800

Hi,

When currently parsing an exported sequence of an Ensembl mouse gene (using the Export Data function at www.ensembl.org) there appear to be 3 problems:
I tried to attach an example of an exported sequence of the Igf1 gene but then the message was bounced because of a suspicious header...

1. Some of the exon locations start with .0:
I think this is a bug of the EMBL formatting at Ensembl?

FT exon .0:44020..44364
FT /exon_id="ENSMUSE00000233709"
FT /start_phase=0
FT /end_phase=0

2. The first annotation of a CDS feature is written on the next line after CDS. This is not found by the EMBL parser.
I think that is is also a bug at Ensembl?

FT CDS FT /gene="ENSMUSG00000020053"

3. Some of the lines cannot be parsed, for example the parser writes to System.out: "This line could not be parsed: exon 2001..2159"
This one I don't understand, I cannot see a problem for these features?

FT exon 2001..2159
FT /exon_id="ENSMUSE00000248454"
FT /start_phase=0
FT /end_phase=0

Thank you in advance!

Stein.

--
Stein Aerts BioI@SISTA
K.U.Leuven ESAT-SCD Belgium
http://www.esat.kuleuven.ac.be/~dna/BioI

_______________________________________________
Biojava-l mailing list - [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

[Biojava-l] Ensembl gene parsing

Reply via email to