-----Original Message-----
From: Eleanor Dodson [mailto:[EMAIL PROTECTED]
Sent: Friday, August 11, 2006 12:13 PM
To: Ian Tickle
Cc: Miguel Ortiz Lombardia; [EMAIL PROTECTED]; CCP4bb
Subject: Re: [ccp4bb]: gap links
Obviously we all should accept using the mmCIF format for
coordinates.
That assigns
a residue NAME which can be 1 2 3 7 6 8 8A etc etc" and
a residue NUMBER which will be 1 2 3 4 5 6 etc for sequential
residues..
This discussion demonstrates the inadequacy of the PDB 80 char record
Eleanor
Ian Tickle wrote:
*** For details on how to be removed from this list visit the ***
*** CCP4 home page http://www.ccp4.ac.uk ***
I come down strongly on Bernhard's side here, and have to
disagree equally strongly with Miguel. This is an issue I've
tried to take up on the CootBB (sadly so far with limited
success!). There will always be conflicts between the
'natural' scientists (i.e. physicists, chemists, biologists
etc) and the computer scientists over what is feasible in
software, but it seems to me that a fundamental principle
should be that in the first instance the natural scientist
dictates his/her requirements to the computer scientist, not
the other way around! The natural scientist is the
'customer' and the computer scientist is the 'service
provider' and we all know that the customer is always right
(even when he's wrong!). Too many times the programmer
produces software 'features' (or bugs depending on how you
look at it!) that are convenient from the programming point
of view but are not what the scientist actually wants. Now
clearly there will be situations where the scientist is ask!
ing for something that's just totally unfeasible in
software, and then there will have to be some negotiation,
but it still behoves the programmer to accommodate the
scientist's wishes as far as is practical.
It seems to me that 'biological' (i.e. essentially
arbitrary) residue numbering most definitely falls way short
of the class of unreasonable requests. The biologist
essentially wants the residue 'number' (actually a name if
you include the chain ID and insertion code) to be merely a
label, nothing more, obviously firstly to identify the
residue on the graphics, but also to relate it to the
corresponding residue in homologous structures. Therefore
the programmer must not infer anything concerning the
sequence (such as the residue connectivity) purely from the
labels! It seems to me completely crazy that the biologist
has to relabel his meaningfully labelled sequence just to
make life comfortable for the programmer - and to maintain
different sets of numbers for different purposes! If the
biologist really wants to label his/her contiguous sequence
'12345 -15X 5 6 -99W ...' then so be it (anything
becomes possible if the numbers are treated purely as
labels). It's the!
programmer's job to accommodate that in software, it's not
his place to question the wisdom of the biologist.
In the majority of structures each unique chain identified
by the chain ID is contiguous, so that obviously has to be
the default presumption, regardless of the labelling. Since
we are assuming that the residue labels provide absolutely no
information concerning the connectivity, and given the
current limitations of the PDB format, I think the programmer
is entitled to require that the ordering of residues in the
file is the same as that in the sequence (otherwise you would
need an additional column to specify the ordinal numbers of
the residues). Then there has to be a way of telling the
software where the breaks in the sequence are. In most cases
this will be obvious (e.g. the C-N distance is 10 Ang). In
the few cases that the program is unable to infer a break
from the distance, the user clearly would be expected to
provide that information. In the RESTRAIN program I required
that each chain break is flagged by a TER record, though
strictly that is only used to flag !
end-of-chain (AFAIK other software ignores the TER record).
It seems to be that fixing this on-going problem is not
beyond the bounds of what we can reasonably expect from the software.
Cheers
-- Ian
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Miguel Ortiz Lombardia
Sent: Thursday, August 10, 2006 8:01 AM
To: [EMAIL PROTECTED]
Cc: 'CCP4bb'
Subject: Re: [ccp4bb]: gap links
*** For details on how to be removed from this list visit the ***
*** CCP4 home page http://www.ccp4.ac.uk ***
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
refmac5 must be assuming that you number your protein
according to your
protein sequence, which is continuous. In my opinion, this
is reasonable.
Uhhh... this assumption turns perilious quickly, because there
are post-translational mods and splicing (see Concanavalin),
and biologists sometimes prefer keeping the
key residues in related structures (trypsin, fabs, etc) at
a certain residue number. This causes
sequence insertions (addressed correctly, as you say)
and gaps (not addressed correctly, my situation.)
Sure, but after any modification whatsoever the sequence of
the final
protein is, except for perhaps a few pathological cases,
continuous.
Now, I can understand, though not always agree, that
biologists (I am
one) prefer to give a consistent number to a particular residue
in a
family of proteins, but for a refinement program I still think
it is
reasonable to consider the numbering as continuous by default: this
would be the most usual situation, I would say.
In any case, knowing that you can fix the problem using
TRANS (perhaps
even CIS if the thing is really bizarre) is very useful, thanks!
Miguel
- --
Miguel Ortiz Lombardía
Centro de Investigaciones Oncológicas
C/ Melchor Fernández Almagro, 3
28029 Madrid, Spain
Tel. +34 912 246 900
Fax. +34 912 246 976
email: [EMAIL PROTECTED]
www: http://www.ysbl.york.ac.uk/~mol/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~
Je suis de la mauvaise herbe,
Braves gens, braves gens,
Je pousse en liberté
Dans les jardins mal fréquentés!
Georges Brassens
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
iD8DBQFE2tmRF6oOrDvhbQIRAoW9AJoCpyWRpC+R6XGzn6IGxniwwRK2UgCgoyDe
RRece2CHvTn8P22eekYjbZc=
=61Z2
-----END PGP SIGNATURE-----
Disclaimer
This communication is confidential and may contain
privileged information intended solely for the named
addressee(s). It may not be used or disclosed except for the
purpose for which it has been sent. If you are not the
intended recipient you must not review, use, disclose, copy,
distribute or take any action in reliance upon it. If you
have received this communication in error, please notify
Astex Therapeutics Ltd by emailing
[EMAIL PROTECTED] and destroy all copies of the
message and any attached documents.
Astex Therapeutics Ltd monitors, controls and protects all
its messaging traffic in compliance with its corporate email
policy. The Company accepts no liability or responsibility
for any onward transmission or use of emails and attachments
having left the Astex Therapeutics domain. Unless expressly
stated, opinions in this message are those of the individual
sender and not of Astex Therapeutics Ltd. The recipient
should check this email and any attachments for the presence
of computer viruses. Astex Therapeutics Ltd accepts no
liability for damage caused by any virus transmitted by this
email. E-mail is susceptible to data corruption,
interception, unauthorized amendment, and tampering, Astex
Therapeutics Ltd only send and receive e-mails on the basis
that the Company is not liable for any such alteration or any
consequences thereof.