Some more comments:

First off, I think rmsd's are very poor descriptors, because they use single values to describe very complex aspects of very complex structures. Furthermore, a couple of really bad outliers can give the wrong impression about an otherwise OK model. IMHO, a more discriminating way of looking at bond lengths and other aspects is needed, e.g., one similar to how we look at Ramachandran statistics. We could say "99.5% of all bond lengths are in the most favorable range, 0.4% are in the generously allowed range, and 0.1% are disallowed". Of course, this will require a panel of experts to come up with definitions for "most favorable", "generously allowed", and "disallowed".

That aside, what is the purpose of rmsd's?

I think one can indeed agree that the structure of an object does not change with the resolution one looks at it. From that, one can conclude, that one should use the same target values for bond lengths, etc., for all resolutions. This assumes that geometric descriptors, like rmsd's, tell us about the physical plausibility of a given model. They are independent of diffraction data and are thus a valid concept even in the limit of zero reflections. They are used to make sure that we have a plausible model despite the lousy quality of our diffraction data. We use other tests (such as R factors, map correlation, and so on) to tell us how well the model corresponds to the data, but that's a different issue. Thus, it is the well-founded expectations about the physical reality that should trump the experimental data in all resolution ranges. Note that this does not preclude the discovery of new features, because we have ways of detecting where the model doesn't fit the data. 

However, one can also use rmsd's to guide the refinement process itself. For example, at low resolution, we can't distinguish between bond lengths of 1.4 or 1.6Å. Should we therefore not allow the bond lengths to vary much more than at high resolution? Target rmsd's should therefore be higher at low resolution than at high resolution, not the other way around. They should go parallel with the coordinate error. The argument is that using constraints (i.e. low rmsd's) at low resolution gives the impression of a precision that is way too high compared to the information content of the data. This assumes using rmsd's as error models, which we have to match to experimental errors.

So, given all these arguments, I think there are in fact different purposes and different ways of using and even defining rmsd's.

All that aside, I don't quite understand what the obsession with bond lengths is all about. If we analyze "strange" features in structures, they are usually due to deviations in angles coming from the unique packing, not so much bond lengths. Thus, I would even go so far as to suggest to generally restrain bond lengths to a very narrow range and put more emphasis on torsional refinement of semi-rigid bodies. But that's a different can of worms.

Best - MM

--------------------------------------------------------------------------------
Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353


Reply via email to