Hi Nermin,
To complement Guy's reply: You could also use the EMBLCDS database. This
one contains all CDSs in EMBL-Bank (soon to be called ENA = European
Nucleotide Archive). This one is available via the EBI's ftp server at
pub/databases/embl/cds. The identifiers in this database correspond to
the protein_id feature in the EMBL-Bank Feature Table which maps each
CDS to corresponding protein translation. These in turn can be
identified in UniProtKB. Please see the README.txt file at:
ftp.ebi.ac.uk/pub/databases/embl/cds/README.txt
for further details.
Further to the above, and depending on the proteome in question, you
could have a look at the integr8 directory on the ftp server as well:
ftp.ebi.ac.uk/pub/databases/integr8
In here you will find the proteomes of more than 1600 organisms, mainly
bacteria and archea, but also human, rat, mouse, etc.
R:)
Nermin Celik wrote:
Hi,
I have the CDS section of a feature table and a genome of an organism.
Which EMBOSS program will allow me to extract the coding regions defined
in the CDS file from the genome and then translate them to protein
sequences?
Example of CDS file:
FT CDS 166..231
FT /systematic_id="ROD00001"
FT CDS 313..2775
FT /systematic_id="ROD00011"
FT CDS 2778..3707
Thank you.
Nermin
_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss
_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss