On 2014-09-09 Jonathan Gregory <[email protected]> commented:
>> You are right regarding the calculation - we are using a statistical model
>> of the relationship between monthly rainfall and return period that was
>> developed many years ago by a colleague from an analysis of 60 years
>> of historical data.
[...]
> Right. So it is reasonable to describe it as a conversion of precipitation
> amount to probability, I think.

> It would be useful to know if anyone else reading this has a view on my
> suggestion of precipitation_amount_converted_to_cumulative_probability.


Yes, I have a slightly more general view of this. I think it is not so useful 
to try to include the whole or part of the data processing that produced the 
values _as part of the variable name itself_. For many practical applications 
it is most relevant for its use to know what the data _is_, not where it came 
from or how in details how it came about.

Thus with this particular case as example I would rather have preferred 
something more direct like one of these:

precipitation_amount_cumulative_probability
precipitation_cumulative_probability
cumulative_precipitation_probability


The reason I think this is a better idea, is that we easily can imagine that 
alternative approaches do exist for arriving at the same desired quantity. E.g. 
there might exist one process that does something like this:

1. Measure precipitation.
2. Run statistics of the measurements and come up with a "probability" estimate.

However, alternatives may exist, such as this:

1. Make use of collected measurements (or results from other models) describing 
a set of properties other that precipitation.
2. Apply some statistical approach to this, which may then predict a 
"probability" directly.


My point about all of this, though, is that for the next model "down stream" of 
the above, the only relevance may be that it needs the probabilities as input 
to its calculations. Therefore introducing the specifics about whence the data 
came contributes nothing more than semantic distraction (or added complexity) 
into that model.

Please don't take me wrong on this though, as I'm not suggesting that the 
additional information about the processing is irrelevant as such! In fact I 
think it could be highly relevant to know this in many contexts. However, there 
are other mechanisms in NetCDF/CF to convey this already, which seems even 
better suited for that kind of info. I'm thinking in particular of the 
"comment", "history" and "source" attributes.

I did a quick scan through v27 of the standard name table, and I could only 
locate a couple of names that hint at which processing has been applied to it: 
Thoes are the "eastward_transformed_eulerian_mean_air_velocity" and
"northward_transformed_eulerian_mean_air_velocity" and their aliases.

Of other typical semantic fragments found in the names, I found:

- Those that hint at the actual 'units' to be expected: "mole", "moles_of", 
"fraction", "fraction_of".

- Those that indicate causality: "due_to".

- Those that indicate a "medium" or environment: "in_air", "in_sea_water", ...

There are also some that we could classify as border-line cases: those with 
"product_of" and "derivative_of". (It can still be argued that this term is 
only about WHAT, not HOW.)


To sum up I therefore think that adopting a term like "converted_to" into the 
standard CF nomenclature would represent a significant change of the principles 
followed so far.
I would appreciate if others could contribute their view on these matters, 
though!

-- 
Regards,

-+-Ben-+-
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to