Dear All, The purpose of statistics in the output of Scalepack is to help the experimenter to assess the data. The question is, what is the purpose of R-merge statistics and its usefulness when its value exceeds 100%?
When Scalepack was originally written 20 years ago, I made a decision to output the value 0.000 for R-merge values above 100%. Resolution shell with such R-merge may, depending on circumstances, contain perfectly fine data for structure refinement or data that are completely useless. In general, as in the case that started this discussion, high multiplicity will result in data close to the resolution limit having such high R-merge value. The best way to assess the resolution limit of the collected diffraction is to look at the refinement's R- and R-free factors. However, one has to make a preliminary judgement at an earlier stage about which data to forward to subsequent calculations. The 0.000 R-merge value is simply a pointer to the experimenter that one should pay attention to other criteria than R-merge statistics. I did not want to print N/A or some other non-numerical string to simplify the parsing of Scalepack output. I always considered R-merge as useful statistic only for shells with strong reflections, effectively meaning low-resolution data. For these data high values of R-merge (e.g. 10%) indicate the presence of systematic errors or effects. Otherwise, R-merge is a rather poor proxy for relevance of data. Other indicators that are much more useful to define the resolution limit are: - I/sig(I) if goodness-of-fit (chi^2) is close to 1 in this resolution shell; if not, one should only adjust the error scale factor, not the estimate of systematic error (Scalepack keyword: error systematic); - CC1/2 (or CC*) is the next best criterion; - other criteria can also be used, e.g. Rpim. The current version of HKL suite prints out all these statistics. Quite frequently, when a program, particularly a widely used one, seems to fail, it is an indication that there are issues with the data. This has been the case in other recent thread related to problems with indexing/processing of data. Something needs to be changed in such cases; it could be the input to the program or, in case of R-merge statistics, one should pay attention to something else rather than consider it a program failure. Best regards,
