Unfortunately, not all is ever that easy. If we were to follow Mark's rules, glycoproteins would end up losing their asparagines (and others residues depending on the type of glycosylation) to numerous underdefined new residue types. And consider the confusion caused by the different versions of N-linked glycosylation that dominates yeast vs. insects vs. mammals, which would lead to different residue names. It does not even end there: due to its heterogenous nature, you would end up with different "modified residue names" for, say 6' fucosylated vs 3' and 6' fucosylated, arthropod-type N-linked glycosylated Asparagine. All of this, when we can barely model the basic sugars half-right, let alone identify them.

For covalent inhibitors, what Herman mentions - regular amino acid names with links, would also be clearly the more practical solution.

My gripe is, as Peter mentioned, having X's in protein sequences. Why can't we have MSE as M? Isn't this obvious? But as always, it does not end there. For example, pyroglutamic acid, a naturally occuring form of *both* Glu (E) and Gln (Q) ends up being labeled in sequence as E (or it used to be). Even when every sequence database has the residue as a Q. A sensible solution would be to follow the depositor-submitted sequence, or the obvious encoded amino acid, here.

Despite being a very ugly solution, the practical way does appear to be a combination of (1) very loosely following simple rules (such as a ten-atom rule), (2) following the established convention, when there is one (such as for glycosylation), (3) and rational discussion between the depositors and annotators. The trick to this working is being able to compromise and being rational. Another reason why annotators should be trained biochemists and chemists.

Engin

P.S. Still recovering from when I was told by PDB staff that an HPUB structure could not yet be released, because the publication mentioned was just a Nature "Letter", not an "Article".

On 7/9/13 10:21 AM, [email protected] wrote:
Dear Marc (and BB),

I guess as usual, in real life the obvious is less obvious as it seems to be. 
I, and I guess many of my colleagues trying to find new drugs, have quite a few 
protein-inhibitor complexes where the inhibitor formed a covalent link with 
e.g. the active site serine. In these cases, I am perfectly happy with having 
the inhibitor being defined as a separate group, linked via a LINK record. For 
me, it does not make sense to treat these covalent inhibitors differently from 
noncovalent inhibitors.

In the end, I guess, it will boil down to some arbitrary choice, either imposed 
upon us by the pdb, or individually taken by the crystallographer who produced 
the crystal structure.

My 2 cts,
Herman
-----Ursprüngliche Nachricht-----
Von: CCP4 bulletin board [mailto:[email protected]] Im Auftrag von Mark J 
van Raaij
Gesendet: Dienstag, 9. Juli 2013 16:23
An: [email protected]
Betreff: Re: [ccp4bb] modified amino acids in the PDB

- really the only complicated case would be where a group is covalently linked 
to more than one amino acid, wouldn't it? Any case where only one covalent link 
with an is present could (should?) be treated as a special amino acid, i.e. 
like selenomethionine.
- groups without any covalent links to the protein are better kept separate I 
would think (but I guess this is stating the obvious).

Mark J van Raaij
Lab 20B
Dpto de Estructura de Macromoleculas
Centro Nacional de Biotecnologia - CSIC
c/Darwin 3
E-28049 Madrid, Spain
tel. (+34) 91 585 4616
http://www.cnb.csic.es/~mjvanraaij





On 9 Jul 2013, at 12:49, Frances C. Bernstein wrote:

In trying to formulate a suggested policy on het groups versus
modified side chains one needs to think about the various cases that
have arisen.

Perhaps the earliest one I can think of is a heme group.
One could view it as a very large decoration on a side chain but, as
everyone knows, one heme group makes four links to residues.  In the
early days of the PDB we decided that heme "obviously" had to be
represented as a separate group.

I would also point out that nobody would seriously suggest that
selenomethionine should be represented as a methionine with a missing
sulfur and a selenium het group bound to it.

Unfortunately all the cases that fall between selenomethionine and
heme are more difficult.  Perhaps the best that one must hope for is
that whichever representation is chosen for a particular case, it be
consistent across all entries.

                          Frances

P.S. One can also have similar discussions about the representation of
microheterogeneity and of sugar chains but we should leave those for
another day.

=====================================================
****                Bernstein + Sons
*   *       Information Systems Consultants
****    5 Brewster Lane, Bellport, NY 11713-2803
*   * ***
**** *            Frances C. Bernstein
  *   ***      [email protected]
***     *
  *   *** 1-631-286-1339    FAX: 1-631-286-1999
=====================================================

On Tue, 9 Jul 2013, MARTYN SYMMONS wrote:

Hi Clemens
    I guess the reason you say 'arbitrary' is because there is no
explanation of this rule decision?
   It would be nice if some rationalization was available alongside the values 
given.
So a sentence along the lines of 'we set the number owing to the
following considerations' ?
   However a further layer of variation is that the rule does not seem
to be consistently applied
  - just browsing CYS modifications:
    iodoacetamide treatment gives a CYS with only 4 additional atoms
but it is split off as  ACM.
    However some ligands much larger than 10 residues have been kept
with the cysteine ( for example CY7 in 2jiv and NPH in 1a18.
    My betting is that it depends on whether something has been seen
'going solo' as a non-covalent ligand previously so that it pops up
as an atomic structural match with a pre-defined three-letter code.
   This would explain for example the ACM case which you might expect
to occur in a modified Cys.  But it has also been observed as a
non-polymer ligand in its own right so goes on as a separate modification?
    However to be honest I am not sure I have ever seen the rationale
for this written down.
   'Non-polymer' heterogens can turn up either linked or not. Once
they are in the residues they have to make a call on which kind of
backbone they will feature in within the pdb.
   That is why there is  'D5M' for non-polymer deoxyAMP. Also known as
' DA' when it is 'DNA-linking' but so far not fessing up to life
under a third code as 'RNA-linking'....
Now is perhaps the time to ask for explanations of these nomenclature
features before they become hard-wired in the new pdb deposition
system (however there may be time - I refer you to my previous posting ;).
Cheers
     Martyn
Dr Martyn Symmons
Cambridge
_____________________________________________________________________
________________
From: Michael Weyand <[email protected]>
To: [email protected]
Sent: Monday, 8 July 2013, 10:03
Subject: [ccp4bb] modified amino acids in the PDB Dear colleagues, We
deposited protein structures with modified lysine side chains and
were surprised that the PDB treats the modification as an independent
molecule, with a ?LINK? record indicating the covalent bond ? instead
of defining a modified residue (that?s what we had uploaded to the PDB).
Apparently, anything attached to an amino acid is considered an
independent molecule (and the lysine just called a regular lysine) if
it comprises more than 10 atoms (see below for the PDB guidelines).
I think that?s kind of arbitrary and would give all modified residue
also modified names ? i.e. individual names for all modified lysines,
as it is done for acetyl- or methyl-lysines, for example. I wonder
what other people?s opinion is?!
Best regards
Clemens
---------------------------------------------------------------------
---------------
------------
This is in accordance to the wwPDB annotation guidelines
(http://www.wwpdb.org/procedure.html#toc_2).
"*Modified amino acids and nucleotides* If an amino acid or
nucleotide is modified by a chemical group greater than 10 atoms, the
residue will be split into two groups: the amino acid/nucleotide
group and the modification. A link record will be generated between
the amino acid/nucleotide group and the modification. For modified
amino acids and nucleotides that were not split will follow standard atom 
nomenclature."

Reply via email to