There is a collection of posts (unfortunately with a number of spam messages) at
http://wwpdb-remediation.rutgers.edu/mail-archive/ with various comments. Although I'm not familiar with the internal workings of this remediation program, it seems indeed that the PDB format is now largely auto-generated from the internally used mmCIF. Unfortunately in my experience (having had a look at a few dozen random entries of the new PDB files) this means that some of the new PDB files of old entries will look very different from what you/we deposited several years ago. The format seems better (internally consistent) but the content has sometimes suffered. But I guess there is always room for frictions when one side is mainly interested in data format, storage and databases and the other mainly interested in the crystallographic content. Finding a good compromise between those two groups of experts is non-trivial. At least the new databases will always have a link to the original version of the PDB file - although it will still mean I can't now search for an author name MUELLER (German U-umlaut transfered in the proper ASCII format), since the PDB files now contain MULLER (because PUBMED isn't able to properly translate non-ASCII names ...). Or an analysis of programs used for structure solution will show a veri different distribution - since the information has been significantly changed. Anyway, have a look at your favourite PDB file with the attached script ./pdb23.sh 1abc It is quite interesting sometimes. I haven't cehcked the mmCIF files - maybe they are much better (as a 'hint' from the database people to the crystallographers to stop using PDB format and switch to mmCIF, maybe?). Cheers Clemens On Sat, Jul 21, 2007 at 12:05:35PM -0700, Ethan A Merritt wrote: > On Saturday 21 July 2007 11:12, Joe Krahn wrote: > > we all use in our daily research. They don't even want to keep the PDB > > format at all. It's primary purpose now is for structural biologists. > > That is inevitable. The PDB format is simply not capable of representing > the complexities of current crystallographic models, and will only become > more obsolete as the state of the art progresses. Because it is so wide- > spread, it will remain a legacy format for import/export into programs > that are not up to the current crystallographic state of the art. Yes, > that means it will largely be used by non-crystallographers to import > and view structures. > > Thus I think the writing is on the wall that the PDB format as a primary > working medium in crystallography is on its deathbed. Of course it may > linger there for a long while yet, and may be poked at from time to time > in order to stave off its final expiration. > > Having said that, I don't understand the motivation for changing this > legacy format to something that the legacy programs will not recognize. > That indeed seems self-defeating. > > Ethan Merritt > > > > > The new PDB format (version 3) has a lot of very useful improvements, > > and an update is long overdue. However, I am irate that RCSB chose NOT > > to use the ACA meeting to discuss the changes. Instead, the format is > > being put into production at the same time as the ACA meeting. It is > > essentially stating that opinions expressed at the ACA do not count. > > Their was a lot of conflict at their last attempt at an update. Instead > > of working to better involve the structural biologist community, I feel > > that they are intentionally discounting our interests because working > > with the user community is too much effort. > > > > Unfortunately, structural biologists generally do not want to spend time > > arguing about file formats, while computer scientists can carry on for > > weeks over minor details. This change is going to affect all of us. If > > you have concerns about the new format that have not been addressed, it > > is important to take action now. The PDB format is not just their > > personal database format (that's what mmCIF is for), but the format that > > we all use in our daily research. They don't even want to keep the PDB > > format at all. It's primary purpose now is for structural biologists. It > > is essential that we be part of the decision making process. > > > > I just sent the following letter to the wwPDB, which is where > > comments about the new format are supposed to go. If you will be at the > > ACA meeting, I encourage you to complain loudly. > > > > Joe Krahn > > > > ----------------------------------------------------------------------- > > To: [EMAIL PROTECTED] > > Subject: The new PDB format is WRONG. > > > > It seems obvious to me that the RCSB and wwPDB worked on the new format > > to consider database users needs, but has intentionally ignored the rest > > of the user community. RCSB manages mmCIF for database purposes, and has > > declared a lack of interest in even keeping the PDB format. Obviously, > > the primary purpose of the PDB format is for structural biologists > > working with individual structures, and not database users. > > > > Most of the updates are quite positive and beneficial, but I think that > > some changes are detrimental. My only serious complaint is that RCSB, > > and now wwPDB, seem to be ignoring the interests of much of the > > scientific community which they are supposed to be serving. All that I > > ask for is appropriate inclusion of all of the user community. This is a > > big change that will affect thousands of people. We should ensure that > > it is the best possible format update before we all have to expend a > > huge effort to deal with it. > > > > I have seen many comments about the format by well known > > crystallographers ignored. One example is the use of SegID. Most > > structural biologists have favored it for years, but RCSB continued to > > deny us, on grounds that it is not "well defined". It would be better to > > make a better definition, and allow it to be used to group together > > non-covalent groups, such as waters with a specific protein molecule. > > This is important because the use of ChainID for non-polymers has been > > banned, which also goes against the wishes of most users. > > > > The latest atom alignment rule changes is also detrimental. RCSB has > > totally broken the element alignment rules, on baseless grounds that it > > was too hard to follow. The new change convolutes this rule even > > further, and essentially follows an earlier attempt at IUPAC hydrogen > > names that the community strongly rejected. At this point, the best > > solution is probably to make it completely left justified. Again, my > > main concern is not to follow my idea, but to ensure that the user > > community gets a fair chance to participate in the final decision. > > > > Another problem is that the original meaning of HET groups continues to > > be corrupted. ATOM records are for commonly occurring residues from a > > list of standard residues. Water is obviously common, and should not > > have been converted to a HET group. HET groups have NO relation ship to > > polymeric state. With water as a HET group, a proper PDB file for a > > modeller with bulk solvent would require CONECT entries for every single > > water. It is also important to emphasize that the HETNAM is the actual > > unique ID, not the 3-letter code. The current hack is to treat > > everything as an ATOM, which has a pre-determined connectivity. This > > cannot continue forever, and we are already stuck with meaningless > > 3-letter codes instead of useful 3-letter abbreviations. The unique > > 3-letter code should be continued for now, but there should be an > > emphasis on beginning to use the full HETNAM so that the inevitable > > switch top non-unique 3-letter codes will not have a big impact. > > > > Thank you, > > Joe Krahn > > > > -- > Ethan A Merritt > Biomolecular Structure Center > University of Washington, Seattle 98195-7742 > -- *************************************************************** * Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com * * Global Phasing Ltd. * Sheraton House, Castle Park * Cambridge CB3 0AX, UK *-------------------------------------------------------------- * BUSTER Development Group (http://www.globalphasing.com) ***************************************************************
