Dear Ken I have had time at last to study and think a bit about your detailed proposal. Thank you for preparing and presenting it. I appreciate it's frustrating for you that this issue is going slowly. Speaking for myself and from David's comments too, I believe this is because it is a large and complicated proposal; when you're busy (as we all are), it's hard to create a large enough chunk of time to address something requiring lengthy thought. Things might go faster if we dealt with it a piece at a time.
I formed my opinions before reading David's, and I find (without surprise) that many of them are the same. Like David, I'm grateful for your link to the [GUM](https://urldefense.us/v3/__https://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf__;!!G2kpM7uM-TzIFchu!l91lQuAvrnyEcw2i_0gvpgd5pQQIeNCsYJe1oKd9V9FkUVhZ86NZkE9Yi8y9bUGNaxFBWlC0-e4$ ). I too agree with your approach of using ancillary variables to contain measures of uncertainty. The CF standard (section 3.4) doesn't say what dimensions ancillary variables should have. Since they're intended to provide metadata about individual values of a data variable, they would normally have all the same dimensions. However, I don't think it would be problematic to allow dimensions to be dropped over which the uncertainty doesn't vary. You could drop all the dimensions to provide a scalar uncertainty, as in your examples. I don't think that standard names are the right way to describe the uncertainties, because the standard name should still identify the geophysical quantity for which it is an uncertainty e.g. `air_temperature`, and because each standard name requires particular canonical units, whereas the uncertainties have the same units as the data. David mentioned that your proposal requires ancillary variables themselves to have ancillary variables. I didn't notice an instance of that in the examples - is there one? The earlier long and detailed [discussion](https://urldefense.us/v3/__http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2013/006106.html__;!!G2kpM7uM-TzIFchu!l91lQuAvrnyEcw2i_0gvpgd5pQQIeNCsYJe1oKd9V9FkUVhZ86NZkE9Yi8y9bUGNaxFBh5lZ0XQ$ ) of 2013, which David referenced, is certainly very relevant to your proposal, regarding the distinction between `cell_methods` and standard name modifiers. Two of the four standard name modifiers (`number_of_observations` and `status_flag`) are now deprecated, in favour of using them as standard names instead. That is fine because they don't have units. The other two (`detection_minimum` and `standard_error`) are uncertainty measures, and hence relate to your proposal particularly. In order not to complicated the standard and software, it is one of the CF principles that we don't introduce a new way to do something we can already do, even if the new way is agreed to be better, but even so I would be happy if your proposal provided an alternative and better framework for these measures! Since ancillary variables are like data variables, I think we could allow them to have `cell_methods`. As in the discussion of 2013, I believe that `cell_methods` would be a good place to identify the variable as a measure of uncertainty. This would mean expanding the idea of what cell methods is for. At the moment its role is to describe how the data represents statistical variation of the geophysical quantity within the cells. It seems to me that this can encompass uncertainty as well if we regard that as being variation over different realisations of the cells. If the uncertainty comes from repeated measurement of a quantity with the same spatiotemporal coordinates, you might really add a dimension which runs over the individual measurements. This is exactly like an ensemble of model runs e.g. `float air_temperature(time,lat,lon,realization)`, where `realization` is the sample dimension. Then if you calculated the standard deviation of the sample in each spatiotemporal cell, it would have `cell_methods="realization: standard_deviation"`. The collapsed realization dimension, now of size 1, could be dropped, because `realization` is also a standard name, and hence the `cell_methods` implies that a standard deviation was computed over the entire set of realizations, about which no information is retained (Section 7.3.4). Most of your examples of uncertainty are mathematically described as standard deviations. I think they are actually standard errors in the statistical sense: "The standard error (SE) of a statistic is the standard deviation of its sampling distribution or an estimate of that standard deviation" (wikipedia). I note that the GUM doesn't use that term, and probably "experimental standard deviation" is the same concept, isn't it? I think it's confusing to call it a standard deviation, however, because it is not the SD of the sample; it's divided by sqrt(N). I would prefer `standard_error` as a new `cell_method`, also for consistency with the standard name modifier that has the same meaning, and allowing us to use the existing `standard_error_multiplier` attribute, as David mentioned, instead of a standardised comment in cell methods, as you suggest. All the above leads me to suggest a syntax such as `cell_methods="uncertainty: standard_error"` for an uncertainty that is a mathematically treated as a SD, like most of your examples. In this syntax, `standard_error` would be a new cell method, and `uncertainty` would be a new special keyword, rather like `realization` in meaning, as above, but not requiring the idea of a collapsed dimension. You would also like to be able to provide intervals when not symmetrical. That could be done by adding a size-one dimension for probability or percentile, with bounds to specify the interval e.g. `air_temperature(time,lat,lon,probability)`, where `probability` is a size-one coordinate or scalar coordinate variable. This could be identified with a syntax such as `cell_methods="probability: expanded_uncertainty"`. I think that's the term the GUM uses, isn't it? It could also be called e.g. `uncertainty_bounds`. The GUM deprecates "confidence interval". An interval which contains all conceivable values is one which spans probability 0.0 to 1.0. So far this is all about describing the mathematical nature of the uncertainty. You also want to describe what it represents. You do this with the standard name, which David and I both think wouldn't work. Could you do this with standardised comments in the cell methods? For instance, you could add `(statistical)` and `(subjective)` for the GUM's Type A and B. The GUM says, "a Type A standard uncertainty is obtained from a probability density function (C.2.5) derived from an observed frequency distribution, while a Type B standard uncertainty is obtained from an assumed probability density function based on the degree of belief that an event will occur, often called subjective probability. Both approaches employ recognized interpretations of probability." I think that if the uncertainty is unqualified it should be assumed to be the "combined" or total uncertainty. That is consistent with the convention in CF standard names that an unqualified name means everything is included. I think that's enough for now! I wonder what you think. Best wishes Jonathan -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://urldefense.us/v3/__https://github.com/cf-convention/cf-conventions/issues/320*issuecomment-901307274__;Iw!!G2kpM7uM-TzIFchu!l91lQuAvrnyEcw2i_0gvpgd5pQQIeNCsYJe1oKd9V9FkUVhZ86NZkE9Yi8y9bUGNaxFBLpDcKx8$ This list forwards relevant notifications from Github. It is distinct from [email protected], although if you do nothing, a subscription to the UCAR list will result in a subscription to this list. To unsubscribe from this list only, send a message to [email protected].
