Dear Martin,

Thanks for your detailed explanation. I'd like to add a bit more information. 
These variables are not joint distributions, they are 1D distributions for 
different ranges of Z. The question is, does "histogram_of_X[_over_Z]" mean 
that the Z coordinate has to be completely collapsed? It is not clear to that 
the current definition implies that. If Z is not completely collapsed, you can 
then end up with a function of the form frequency(lat,lon,X,Z2), where the 
coordinate Z is only partially collapsed into bins described by Z2. I'm using 
here Z2 to explicitly show when the Z coordinate represents bins. This would 
look like a joint histogram, but it is not. I think that your proposal of 
dropping "_over_Z" from the standard name works for a joint distribution, but 
not for a collection of 1D distributions along Z, unless there is a way of 
distinguishing between both cases with the use of attributes.

Another detail is that these histograms provide relative frequencies (values 
between 0 and 1, not counts), not absolute frequencies. Is that inconsistent 
with the current definition of histogram in CF?

Regards,

Alejandro

> -----Original Message-----
> From: martin.juc...@stfc.ac.uk [mailto:martin.juc...@stfc.ac.uk]
> Sent: 12 October 2016 19:05
> To: cf-metadata@cgd.ucar.edu
> Cc: Bodas-Salcedo, Alejandro
> Subject: Usage of histogram_of_X_over_Z
> 
> Hello,
> 
> There are two standard names of the form histogram_of_..... in the CF Standard
> Name list (at version 36):
> histogram_of_backscattering_ratio_over_height_above_reference_ellipsoid and
> histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid
> . Both of these where used in CMIP5 and set to be used in CMIP6, but the usage
> does not appear to match the standard name desecriptions.
> 
> The possible confusion is over the role of different coordinates. The CF 
> definitions
> say ''"histogram_of_X[_over_Z]" means histogram (i.e. number of counts for 
> each
> range of X) of variations (over Z) of X.' This implies to me that you start 
> with a
> function of Z and possibly other coordinates and end up with a function of X 
> and the
> other coordinates. E.g. if the source data is X(lat,lon,Z), then the 
> histogram data will
> be of the form frequency(lat,lon,X).
> 
> In the two CMIP5/CMIP6 draft variables (cfadLidarsr532, cfadDbze94) using 
> these
> standard names the "Z" coordinate  which is included in the standard name
> ("height_above_reference_ellipsoid") is one of the coordinates of the 
> histogram data
> variable. Both these variables appear to be joint distributions (frequency of 
> X and Y
> values) over sub-grid variability as a function of latitude, longitude and 
> time.
> 
> I've been reviewing these existing definitions in some detail because there 
> are some
> new distribution variables in the request and I'd like to make sure that we 
> have a
> consistent approach.
> 
> If we need to described a variable which carries a joint distribution of X 
> and Y, then
> the variable will have to use X and Y as coordinates, so perhaps we can 
> simplify the
> process by leaving them out of the standard name. Similarly the "over_Z" part 
> of the
> name would be better expressed as a cell_methods construct. This line of 
> reasoning
> suggests using a new standard name such as "frequency_distribution" (units 
> "1").
> The only difficulty is that the frequency distribution might be a function of 
> the
> quantities X and Y (scattering ratio and cloud top height for cfadLidarsr532) 
> and also
> of latitude, longitude and time. There should be some way of distinguishing 
> the
> different roles of these 5 coordinates: is is the distribution of X and Y as 
> a function of
> latitude, longitude and time. I think this could be done conveniently by 
> introducing a
> single new attribute, e.g. "bin_coords: X Y".
> 
> "frequency_distribution" could be used for single or joint distributions.
> 
> My questions to the list are:
> (1) am I missing something in my interpretation of the existing 
> histogram_of_...
> names?
> (2) if not, is the adoption of a "frequency_distribution" standard name an 
> appropriate
> way forward?
> 
> regards,
> Martin
> 
> regards,
> Martin
_______________________________________________
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to