Re: [ccp4bb] what to do with disordered side chains

Dale Tronrud Sun, 03 Apr 2011 23:55:48 -0700

   The definition of _atom_site.occupancy is

 The fraction of the atom type present at this site.
  The sum of the occupancies of all the atom types at this site
  may not significantly exceed 1.0 unless it is a dummy site.


When an atom has an occupancy equal to zero that means that the
atom is NEVER present at that site - and that is not what you
intend to say.  Setting the occupancy to zero does not mean that
a full atom is located somewhere in this area.  Quite the opposite.

   (The reference to a dummy site is interesting and implies to
me that mmCIF already has the mechanism you wish for.)

   Having some experience with refining low occupancy atoms and
working with dummy marker atoms I'm quite confident that you can
never define a B factor cutoff that would work.  No matter what
value you choose you will find some atoms in density that refine
to values greater than the cutoff, or the limit you choose is so
high that you will find marker atoms that refine to less than the
limit.  A B factor cutoff cannot work - no matter the value you
choose you will always be plagued with false positives or false
negatives.

   If you really want to stuff this bit into one of these fields
you have to go all out.  Set the occupancy of a marker atom to -99.99.
This will unambiguously mark the atom as an imaginary one.  This
will, of course, break every program that reads PDB format files,
but that is what should happen in any case.  If you change the
definition of the columns in the file you must mandate that all
programs be upgraded to recognized the new definitions.  I don't
know how you can do that other than ensuring that the change will
cause programs to cough.  To try to slide it by with a magic value
that will be silently accepted by existing programs is to beg for
bugs and subtle side-effects.

   Good luck getting the maintainers of the mmCIF standard to accept
a magic value in either of these fields.

   How about this: We already have the keywords ATOM and HETATM
(and don't ask me why we have two).  How about we create a new
record in the PDB format, say IMGATM, that would have all the
fields of an ATOM record but would be recognized as whatever the
marker is for "dummy" atoms in the current mmCIF?  Existing programs
would completely ignore these atoms, as they should until they are
modified to do something reasonable with them.  Those of us who
have no use for them can either use a switch in the program to
ignore them or just grep them out of the file.  Someone could write
a program that would take a model with only ATOM and HETATM records
and fill out all the desired IMGATM records (Let's call that program
WASNIAHC, everyone would remember that!).

   This solution is unambiguous.  It can be represented in current
mmCIF, I think.  The PDB could run WASNIAHC themselves after deposition
but before acceptance by the depositor so people like me would not
have to deal with them during refinement but would be able to see
them before our precious works of art are unleashed on the world.

   Seems like a win-win solution to me.

Dale Tronrud


On 4/3/2011 9:17 PM, Jacob Keller wrote:

Well, what about getting the default settings on the major molecular
viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")?
While the b cutoff is still be tricky, I assume we could eventually
come to consensus on some reasonable cutoff (2 sigma from the mean?),
and then this approach would allow each free-spirited crystallographer
to keep his own preferred method of dealing with these troublesome
sidechains and nary a novice would be led astray....

JPK

On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett<er...@pobox.com>  wrote:

Most non-structural users are familiar with the sequence of the proteins they 
are studying, and most software does at least display residue identity if you 
select an atom in a residue, so usually it is not necessary to do any cross 
checking besides selecting an atom in the residue and seeing what its residue 
name is.  The chance of somebody misinterpreting a truncated Lys as Ala is, in 
my experience, much much lower than the chance they will trust the xyz 
coordinates of atoms with zero occupancy or high B factors.

What worries me the most is somebody designing a whole biological experiment around an 
over-interpretation of details that are implied by xyz coordinates of atoms, even if 
those atoms were not resolved in the maps.  When this sort of error occurs it is a level 
of pain and wasted effort that makes the "pain" associated with having to build 
back in missing side chains look completely trivial.

As long as the PDB file format is the way users get structural data, there is really no 
good way to communicate "atom exists with no reliable coordinates" to the user, 
given the diversity of software packages out there for reading PDB files and the 
historical lack of any standard way of dealing with this issue.  Even if the file format 
is hacked there is no way to force all the existing software out there to understand the 
hack.  A file format that isn't designed with this sort of feature from day one is not 
going to be fixable as a practical matter after so much legacy code has accumulated.

-Eric



On Apr 3, 2011, at 2:20 PM, Jacob Keller wrote:

To the delete-the-atom-nik's: do you propose deleting the whole
residue or just the side chain? I can understand deleting the whole
residue, but deleting only the side chain seems to me to be placing a
stumbling block also, and even possibly confusing for an experienced
crystallographer: the .pdb says "lys" but it looks like an ala? Which
is it? I could imagine a lot of frustration-hours arising from this
practice, with people cross-checking sequences, looking in the methods
sections for mutations...

JPK

Re: [ccp4bb] what to do with disordered side chains

Reply via email to