Re: [ccp4bb] Non-sequential residue numbering?

Herbert J. Bernstein Fri, 19 Sep 2008 12:03:28 -0700

I would suggest depositors take a look at the PDB Exchange
Dictionary and at the following definitions:


_atom_site.auth_seq_id
               An alternative identifier for _atom_site.label_seq_id that

may be provided by an author in order to match theidentification

               used in the publication that describes the structure.

               Note that this is not necessarily a number, that the values do
               not have to be positive, and that the value does not have to
               correspond to the value of _atom_site.label_seq_id. The value
               of _atom_site.label_seq_id is required to be a sequential list
               of positive integers.

               The author may assign values to _atom_site.auth_seq_id in any
               desired way. For instance, the values may be used to relate
               this structure to a numbering scheme in a homologous structure,
               including sequence gaps or insertion codes. Alternatively, a
               scheme may be used for a truncated polymer that maintains the
               numbering scheme of the full length polymer. In all cases, the
               scheme used here must match the scheme used in the publication
               that describes the structure.

_atom_site.label_seq_id
               This data item is a pointer to _entity_poly_seq.num in the
               ENTITY_POLY_SEQ category.

_entity_poly_seq.num
               The value of _entity_poly_seq.num must uniquely and sequentially
               identify a record in the ENTITY_POLY_SEQ list.

               Note that this item must be a number and that the sequence
               numbers must progress in increasing numerical order.

So, at the very least, the PDB's internal database and mmCIF and PDBML
files should be able to handle _both_ the simplified numbering the
annotator wishes to impose, and the more scientifically useful notation
an author might use to place their structure in context.  It should be
a "simple" matter of programming for the PDB to produce "PDB" entries done
either way.

One should also note the the entire system of insertion codes does not
make much sense without the broader contextual view of families of
structures.

Regards,
  Herbert


At 2:33 PM -0400 9/19/08, Frances C. Bernstein wrote:

I was at the PDB from 1974 - 1998 and closely involved with
processing entries 15 to ~9000.  We also designed the "PDB
format".  My replies were based on what was done for those 24
years and I cannot address what is currently being done at the PDB.

I do not know if the current PDB staff follows this bulletin
board and I can only suggest that you take this matter up
with the current PDB management, the community, and the PDB
advisory board.

                             Frances

=====================================================
****                Bernstein + Sons
*   *       Information Systems Consultants
****    5 Brewster Lane, Bellport, NY 11713-2803
*   * ***
**** *            Frances C. Bernstein
  *   ***      [EMAIL PROTECTED]
 ***     *
  *   *** 1-631-286-1339    FAX: 1-631-286-1999
=====================================================

On Fri, 19 Sep 2008, Linda Brinen wrote:
I'm actually pleased to read your response and interpretation ofwhat is allowable and why, Frances. However, it's it pretty starkcontrast to what I was told about 18 months ago when I struggled(and eventually lost) to preserve a numbering scheme that had along standing historical and literature precedence when submittinga new structure to the PDB.
This was a two-domain protein; the first domain - according tohistorical numbering - had a number plus a letter code to indicatethe domain; the second domain, which started again with the number1 - had no letter code. We were told that that was not allowed. Wewanted to preserve insertions and deletions as well, but were alsostrongly discouraged, if not flat out told we could not. While it'snot usually prudent to quote offline e-mail exchanges, I'm going tosnip pertinent pieces of the discussion (I'm leaving the originalspelling errors and text bolding in place) with no indication ofthe annotator who wrote these guidelines to our group. Here's partof one of the many 'exchanges' that was had:
"I understand your point and that certain close researchcommunities have certain habits and traditions but the PDB servesto the whole community of structural biology, bioinformatics, tomany educators, students... In all these cases, the simplestpossible numbering of sequences, ideally numbering identical to thenumbering used by the UNP sequence database, is far the most usefulbecause easiest to understand. I do not say this because it is inour manuals and help pages but because I have eight years ofexperience with annotation of all kinds of structures. I wouldtherefore very much like to ask you to reconsider the way how younumber your protein, your numbering schema is *interpretation* morethan a mere labeling schema. Needles to say, no sequence numberingcan satisfy this ambition...from my point of view, especially thejump from 96P back to 1 will cause a lot of confusion andmisunderstanding....look at the problem from a standpoint of ageneral naturalist instead of an narrow protease community"
This left us with a mandated 'start from 1 and number sequentially'format that did exactly the opposite of what you, Frances,correctly mention as important in any numbering scheme: preserverelationships with other proteins. We've had to resort toproviding 'translation tables' that identify what people wereexpecting to see as numbers for active site residues which now havenew and non-sensical numbering. Is it the end of the world? Ofcourse not. But neither is it necessarily the best scientific orlogical presentation.
At the risk of inciting a rather....animated...dialogue on thistopic, what has your experience been with this kind of thing (i.e.,were we just unlucky??) and do current practices make sense andserve the community??
-Linda


Frances C. Bernstein wrote:
All entries list atoms starting at the N-terminus (or 5') so
connectivity goes in the order of the atoms in the file -
obviously with the possibility of unconnected portions
where the density is inadequate.

The entire philosphy of allowing numbering other than 1 - N
had to do with preserving relationships with other proteins.
The most common use relates to having an initial sequence 1 - N
and then a similar sequence from another species with insertions
and/or gaps.  People wanted to be able to talk about the active
site (which was preserved) using the same residue numbers.
Negative numbers came up with additions at the N-terminus.
Offhand, I don't recall why descending numbers were used but
I believe that there is at least one such entry.

                       Frances
=====================================================
****                Bernstein + Sons
*   *       Information Systems Consultants
****    5 Brewster Lane, Bellport, NY 11713-2803
*   * ***
**** *            Frances C. Bernstein
  *   ***      [EMAIL PROTECTED]
 ***     *
  *   *** 1-631-286-1339    FAX: 1-631-286-1999
=====================================================

On Fri, 19 Sep 2008, Ian Tickle wrote:
But what connectivity would be implied by descending numbers: the order
in the file or the order of the numbering?  I assume the former,
otherwise what would be the point of having descending numbering?  And I
wonder how many programs would baulk at it (or even at ascending
negative numbers?).

-- Ian
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
Behalf Of Frances C. Bernstein
Sent: 19 September 2008 16:44
To: Todd Geders
Cc: [email protected]
Subject: Re: [ccp4bb] Non-sequential residue numbering?

As long as each residue within a chain has a unique identifier
(residue number plus insertion code), there is no restriction
on numbering.  The numbers can be in ascending or descending
order, non-sequential, and even negative.

                        Frances

=====================================================
****                Bernstein + Sons
*   *       Information Systems Consultants
****    5 Brewster Lane, Bellport, NY 11713-2803
*   * ***
**** *            Frances C. Bernstein
   *   ***      [EMAIL PROTECTED]
  ***     *
   *   *** 1-631-286-1339    FAX: 1-631-286-1999
=====================================================

On Fri, 19 Sep 2008, Todd Geders wrote:
Hello all,

I have a structure from a non-natural fusion of the truncated
C-terminus
of
one protein with the truncated N-terminus of another.  For the
deposition, we
want to keep the numbering as found in the separate proteins.  It
looks
something like this:

            1         12
            |          |
....HWVCKDIALLMCFFLEEMSEEP....
  |        |
754      763

At no point is there an overlap in numbering (i.e. the N-terminal
residue
number is higher than the C-terminal residue number).

Is this numbering scheme supported by the PDB standard?  Thus far,
all
of the
software seems to handle it (refmac, Coot, PyMOL, pdb_extract, PDB
precheck &
validation, etc).

Can anyone see a reason to not deposit with this non-sequential
residue
numbering?

~Todd
Disclaimer
This communication is confidential and may contain privilegedinformation intended solely for the named addressee(s). It maynot be used or disclosed except for the purpose for which it hasbeen sent. If you are not the intended recipient you must notreview, use, disclose, copy, distribute or take any action inreliance upon it. If you have received this communication inerror, please notify Astex Therapeutics Ltd by emailing[EMAIL PROTECTED] and destroy all copies of themessage and any attached documents.Astex Therapeutics Ltd monitors, controls and protects all itsmessaging traffic in compliance with its corporate email policy.The Company accepts no liability or responsibility for any onwardtransmission or use of emails and attachments having left theAstex Therapeutics domain. Unless expressly stated, opinions inthis message are those of the individual sender and not of AstexTherapeutics Ltd. The recipient should check this email and anyattachments for the presence of computer viruses. AstexTherapeutics Ltd accepts no liability for damage caused by anyvirus transmitted by this email. E-mail is susceptible to datacorruption, interception, unauthorized amendment, and tampering,Astex Therapeutics Ltd only send and receive e-mails on the basisthat the Company is not liable for any such alteration or anyconsequences thereof.Astex Therapeutics Ltd., Registered in England at 436 CambridgeScience Park, Cambridge CB4 0QA under number 3751674
--
Linda S. Brinen
Adjunct Assistant Professor
Dept of Cellular & Molecular Pharmacology and
The Sandler Center for Basic Research in Parasitic Diseases
Phone: 415-514-3426 FAX: 415-502-8193
E-mail: [EMAIL PROTECTED]
QB3/Byers Hall 508C
1700 4th Street
University of California
San Francisco, CA 94158-2550
USPS:
UCSF MC 2550
Byers Hall Room 508
1700 4th Street
San Francisco, CA 94158



--
=====================================================
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                 [EMAIL PROTECTED]
=====================================================

Re: [ccp4bb] Non-sequential residue numbering?

Reply via email to