Hi,
When currently parsing an exported sequence of an Ensembl mouse gene
(using the Export Data function at www.ensembl.org) there appear to be 3
problems:
I tried to attach an example of an exported sequence of the Igf1 gene
but then the message was bounced because of a suspicious header...
1. Some of the exon locations start with .0:
I think this is a bug of the EMBL formatting at Ensembl?
FT exon .0:44020..44364
FT /exon_id="ENSMUSE00000233709"
FT /start_phase=0
FT /end_phase=0
2. The first annotation of a CDS feature is written on the next line
after CDS. This is not found by the EMBL parser.
I think that is is also a bug at Ensembl?
FT CDS
FT /gene="ENSMUSG00000020053"
3. Some of the lines cannot be parsed, for example the parser writes to
System.out: "This line could not be parsed: exon 2001..2159"
This one I don't understand, I cannot see a problem for these features?
FT exon 2001..2159
FT /exon_id="ENSMUSE00000248454"
FT /start_phase=0
FT /end_phase=0
Thank you in advance!
Stein.
--
Stein Aerts BioI@SISTA
K.U.Leuven ESAT-SCD Belgium
http://www.esat.kuleuven.ac.be/~dna/BioI
_______________________________________________
Biojava-l mailing list - [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l
- Re: [Biojava-l] Ensembl gene parsing Stein Aerts
- Re: [Biojava-l] Ensembl gene parsing Ewan Birney
- [Biojava-l] Ensembl gene parsing saerts
- Re: [Biojava-l] Ensembl gene parsing Ewan Birney
- Re: [Biojava-l] Ensembl gene parsing Thomas Down
- Re: [Biojava-l] Ensembl gene parsing Thomas Down
- Re: [Biojava-l] Ensembl gene parsing Arne Stabenau
- Re: [Biojava-l] Ensembl gene parsing Stein Aerts
- Re: [Biojava-l] Ensembl gene parsing Matthew Pocock