Re: [ccp4bb] Code to handle the syntax of (mm)CIF data correctly.

Phil Evans Wed, 18 Sep 2013 05:38:46 -0700

As a novice looking at mmCIF from a developers point of view, for reflection 
data, the complication is not so much tokenising (parsing), but what items to 
write or to expect to read. For example as far as I can see an observed 
intensity may be encoded in a reflection loop (merged or unmerged) as any one 
of the following, and there seem to be similar choices for other items:-


 
_refln_intensity_meas
_refln.F_squared_meas
_refln.pdbx_I_plus, _refln.pdbx_I_minus

_diffrn_refln.counts_net
_diffrn_refln.intensity_net

If I'm writing a file, which should I use, and if I'm reading one which ones 
should I expect? And is there a distinction between merged and unmerged data?

confused (easily)
Phil



On 17 Sep 2013, at 15:30, Peter Keller <pkel...@globalphasing.com> wrote:

> Dear all,
> 
> At Global Phasing, we have seen that there are still issues with the way that 
> different applications deal with mmCIF-format data, and this continues to 
> cause problems for users. I believe that part of the reason for this is that 
> the underlying syntax (the STAR format) is not universally understood, and 
> that a common and complete understanding of the full STAR syntax amongst 
> programmers who deal with the format will help with some of the existing 
> problems.
> 
> I wrote some code for low-level handling of the STAR format a while ago that 
> I have been meaning to release for over a year. Garry Battle's announcement 
> on 23 August about the mmCIF/PDBx workshop at the EBI has prompted me into 
> action: I have written a short article that discusses some examples of the 
> issues that we have encountered, and made my code available for download. The 
> references in the article are given primarily as web links: more conventional 
> citations can usually be found in the pages that I link to. This code has not 
> been used in any released products, but it has had some internal use at 
> Global Phasing. There is an MX bias in the article's discussion, but the 
> issues are not restricted to MX.
> 
> As I explain in the article, the handling of the input data is based on an 
> enourmous regular expression that matches STAR data, with only a little logic 
> in the code itself. The regular expression should be usable with a variety of 
> other languages, not only in Java (which I have used in this case). The code, 
> or the regular expression on its own, may be freely used in other projects: 
> see the included licencing for details, but basically you should: (i) give 
> credit for using it, and (ii) if you choose to modify the regular expression, 
> state that you have done so in that credit.
> 
> The article, which contains links to a tar file containing the code, and the 
> documentation, is here:
> 
>   <http://www.globalphasing.com/startools/>
> 
> Hoping that others will find this useful and/or help to resolve or clarify 
> outstanding questions,
> 
> Peter.
> 
> -- 
> Peter Keller                                     Tel.: +44 (0)1223 353033
> Global Phasing Ltd.,                             Fax.: +44 (0)1223 366889
> Sheraton House,
> Castle Park,
> Cambridge CB3 0AX
> United Kingdom

Re: [ccp4bb] Code to handle the syntax of (mm)CIF data correctly.

Reply via email to