Re: [EMBOSS] How to find protein sequences in a given genome using CDS information

Rodrigo Lopez Wed, 04 Feb 2009 01:33:40 -0800

Hi Nermin,

To complement Guy's reply: You could also use the EMBLCDS database. Thisone contains all CDSs in EMBL-Bank (soon to be called ENA = EuropeanNucleotide Archive). This one is available via the EBI's ftp server atpub/databases/embl/cds. The identifiers in this database correspond tothe protein_id feature in the EMBL-Bank Feature Table which maps eachCDS to corresponding protein translation. These in turn can beidentified in UniProtKB. Please see the README.txt file at:


ftp.ebi.ac.uk/pub/databases/embl/cds/README.txt

for further details.

Further to the above, and depending on the proteome in question, youcould have a look at the integr8 directory on the ftp server as well:


ftp.ebi.ac.uk/pub/databases/integr8

In here you will find the proteomes of more than 1600 organisms, mainlybacteria and archea, but also human, rat, mouse, etc.


R:)


Nermin Celik wrote:

Hi,

I have the CDS section of a feature table and a genome of an organism.
Which EMBOSS program will allow me to extract the coding regions defined
in the CDS file from the genome and then translate them to protein
sequences?

Example of CDS file:
FT   CDS             166..231
FT                   /systematic_id="ROD00001"
FT   CDS             313..2775
FT                   /systematic_id="ROD00011"
FT   CDS             2778..3707

Thank you.
Nermin

_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss

_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss

Re: [EMBOSS] How to find protein sequences in a given genome using CDS information

Reply via email to