As a novice looking at mmCIF from a developers point of view, for reflection data, the complication is not so much tokenising (parsing), but what items to write or to expect to read. For example as far as I can see an observed intensity may be encoded in a reflection loop (merged or unmerged) as any one of the following, and there seem to be similar choices for other items:-
_refln_intensity_meas _refln.F_squared_meas _refln.pdbx_I_plus, _refln.pdbx_I_minus _diffrn_refln.counts_net _diffrn_refln.intensity_net If I'm writing a file, which should I use, and if I'm reading one which ones should I expect? And is there a distinction between merged and unmerged data? confused (easily) Phil On 17 Sep 2013, at 15:30, Peter Keller <pkel...@globalphasing.com> wrote: > Dear all, > > At Global Phasing, we have seen that there are still issues with the way that > different applications deal with mmCIF-format data, and this continues to > cause problems for users. I believe that part of the reason for this is that > the underlying syntax (the STAR format) is not universally understood, and > that a common and complete understanding of the full STAR syntax amongst > programmers who deal with the format will help with some of the existing > problems. > > I wrote some code for low-level handling of the STAR format a while ago that > I have been meaning to release for over a year. Garry Battle's announcement > on 23 August about the mmCIF/PDBx workshop at the EBI has prompted me into > action: I have written a short article that discusses some examples of the > issues that we have encountered, and made my code available for download. The > references in the article are given primarily as web links: more conventional > citations can usually be found in the pages that I link to. This code has not > been used in any released products, but it has had some internal use at > Global Phasing. There is an MX bias in the article's discussion, but the > issues are not restricted to MX. > > As I explain in the article, the handling of the input data is based on an > enourmous regular expression that matches STAR data, with only a little logic > in the code itself. The regular expression should be usable with a variety of > other languages, not only in Java (which I have used in this case). The code, > or the regular expression on its own, may be freely used in other projects: > see the included licencing for details, but basically you should: (i) give > credit for using it, and (ii) if you choose to modify the regular expression, > state that you have done so in that credit. > > The article, which contains links to a tar file containing the code, and the > documentation, is here: > > <http://www.globalphasing.com/startools/> > > Hoping that others will find this useful and/or help to resolve or clarify > outstanding questions, > > Peter. > > -- > Peter Keller Tel.: +44 (0)1223 353033 > Global Phasing Ltd., Fax.: +44 (0)1223 366889 > Sheraton House, > Castle Park, > Cambridge CB3 0AX > United Kingdom