On Monday, 26 March 2012, Francois Berenger wrote:
> Dear list,
> 
> If I take all the fasta files for proteins in the PDB,
> are the sequences complete?
> 
> I mean, do they have holes sometimes (missing amino acids)?

In theory the SEQRES records describe the sequence of the
entity that was crystallized, whether or not it is all visible
in the electron density or present in the deposited model.
So normally there should not be any "missing" internal
residues.  But if the expression construct was a not the full
gene sequence, e.g. an N-terminal truncation, then those
N- or C- terminal residues (or whole domains) will not be
listed.

So goes the theory. There are always corner cases.
I remember having a dispute with the PDB long ago about
whether a peptide chain that was known to have undergone
loop cleavage was properly described with a single
chain identifier or with two chain identifiers.  And if the
cleavage involved excission of one or more residues, would
they appear in the SEQRES records anyhow?


> Sorry for the maybe stupid question but I know that sometimes
> the PDB files have missing residues, I am hoping that
> it is not the case with the FASTA files.

I was assuming that the FASTA files you refer to are just
conversions of the SEQRES records.  If not, then all bets are
off.  If the FASTA files are retrieved by gene ID from Uniprot
or some other sequence data base, then they will be complete in
one sense but may not perfectly match what was in the deposited
crystal structure due to cloning artifacts, strain variation,
allelic non-uniformity, etc.

        Ethan

> Regards,
> Francois.
> 

Reply via email to