Dear Martin > In broad usage, I have the impression that a "histogram" can be expressed as > either a count or a percentage, so we should be explicit in the convention if > we want a narrower definition here. A narrower definition is probably needed, > as there would otherwise be no way of distinguishing between the two.
I agree with that but the idea is that a standard name of histogram would be for a count, while probability would be for a fraction. The latter could be 0-1 or 0-100% - they are dimensionally equivalent but different units. We could clarify that in the guidelines. > There are two further CMIP variables, both or which are bi-variate > distributions, with bins of spectral bands and cloud top height ranges, which > I'd like to bring into the discussion, but it might be useful to transfer the > conclusions of the exchange so far into a ticket first. I think the two > additional variables could be covered by a simple extension to > "probability_density_function_of_X_and_Y" ... though you might want to insert > "joint_" at the beginning of the term. OK, that's interesting. I agree that it would fit. Best wishes Jonathan > > Dear Martin and Alejandro (following off-list discussions) > > > The CF definitions say ''"histogram_of_X[_over_Z]" means histogram (i.e. > > number of counts for each range of X) of variations (over Z) of X.' > > Yes, that's in the guidelines for construction of standard names, and there > are only two of them at present, as you say. The simplest case is when you > have some quantity Q depending on only one dimension, Q(Z). Then the histogram > H(Q) is the number of values of Q which fall into each interval of Q, > considering variation over Z. In general there could be more than one > dimension retained, and more than one removed. If the original field was > Q(P,Y,Z,T), we might construct a histogram H(Q,Z,T), for instance, containing > the frequencies of values of Q falling into joint intervals of Q, Z and T, for > variation over P and Y. Following the guideline above, we would call this a > histogram of Q over P and Y, I think. > > It is not necessary to indicate in the standard name the dimensions which > the histogram depends on (Z and T in my example) because the coordinate > variables (of Z and T) make that clear. Martin suggests that by this argument > we could also omit Q from the standard name, and just call it a histogram > (or frequency distribution) rather than a histogram of Q, where Q is air > temperature, precipitation amount, backscattering ratio, etc. I think there > are two reasons why we include Q in the standard name, > > * I think a histogram of air temperature is not the same geophysical quantity > as a histogram of precipitation amount, for instance, so they should be > distinguished by standard name. > > * Although histograms are pure numbers, and so are probabilities, probability > densities are not. Histograms, probability distributions and probability > density functions are all related ways of expressing the same information. > In the guidelines, we foresee that we might need names for all of them (though > so far we have only histograms) and it would make sense to give them > consistent > names. The probability density function of air temperature has units of K-1, > and of precipitation amount kg-1 m2, for instance. Because they have different > canonical units, they must have different standard names, so Q needs to be > included in the standard name. > > Cell methods describe how the values represent variation within the cells. > The transformation from the values of a quantity to a histogram of the > quantity makes the original quantity into a dimension. This seems more of > a radical transformation than computing a mean or a standard deviation, which > doesn't change the dimensions of the variable, but just reduces their size > (to unity if completely collapsed). A frequency distribution of Q is > regarded as a different geophysical quantity from Q itself, so we have not > used cell methods to describe the relationship. Of course, this is a bit > arbitrary (like everything else in the CF convention!). > > I agree with Martin that we could omit the "over" part of the standard name > for > histograms, probabilities and probability densities. It is useful to retain > the > collapsed dimensions as size-1 dimensions, so that their original range can > be recorded. They could be assigned cell_method of "sum", the default for > extensive quantities, because the histogram applies to their entire range. > The same applies to the variable with has been histogrammed and is now a > dimension; the histogram is a sum for each of its cells. > > For example, in the 1D case, suppose the original field is air_temperature > as a function of time only. Then the histogram variable is > float hair(tair); > hair:standard_name="histogram_of_air_temperature"; > hair:units="1"; > hair:cell_methods="time: sum tair: sum"; > hair:coordinates="time"; > float time; // scalar coordinate variable with bounds > float tair(tair); > tair:units="K"; > > As a multidimensional example, suppose the original field is > float tair(time,altitude,latitude,longitude); > tair:units="K"; > tair:standard_name="air_temperature"; > tair:cell_methods="altitude: mean area: mean time: mean"; > from which we might construct > float pair(tair,time,altitude); > pair:standard_name="probability_density_function_of_air_temperature"; > pair:units="K-1"; > pair:cell_methods="altitude: mean time: mean area: sum tair: mean"; > pair:coordinates="latitude longitude"; // to record the ranges > Here, I suggest that the cell_method for area is "sum", because the PDF > applies to the whole area, which is an extensive quantity. For air temperature > it seems more sense to interpret a PDF as a mean within cells, since a PDF is > an intensive quantity - you can interpolate it, for example - but not a point > quantity if it's calculated from a histogram with finite bin-widths. > > Best wishes > > Jonathan > > _______________________________________________ > CF-metadata mailing list > [email protected] > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata ----- End forwarded message ----- _______________________________________________ CF-metadata mailing list [email protected] http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
