Re: [CF-metadata] Return periods

Hollis, Dan Mon, 08 Sep 2014 09:43:15 -0700

Hi Jonathan,

You are right regarding the calculation - we are using a statistical model of 
the relationship between monthly rainfall and return period that was developed 
many years ago by a colleague from an analysis of 60 years of historical data. 
The model uses values of the coefficients of variation and skewness to describe 
the distribution of monthly rainfall (assumed to be log-normal). To capture how 
the shape of the distribution varies with location we have pre-calculated 
values of these coefficients available at each point on a 5 km grid.


My original suggestion of using an auxiliary coordinate variable was based on 
an analogy with how we plan to deal with the spatial coordinates. Our data are 
stored  in British National Grid projection so consequently we are using 
'projection_x_coordinate' and 'projection_y_coordinate' as our primary 
coordinate variables i.e. the set up is going to be something like this:

x(x)
y(y)

lat(y,x)
lon(y,x)
precip(y,x)
precip:coordinates = "lat lon"

My thought was to extend this as follows:

prob(y,x)
precip:coordinates = "lat lon prob"

I suppose I was thinking of prob(y,x) as defining the position of each 
precipitation amount on an auxiliary 'probability axis'.

If a new standard name is required then I'm happy to take your advice on a 
suitable choice.

What is still not clear to me is how I maintain a clear link between the two 
fields without storing some of the information twice. Is it simply a case of 
storing two variables in the same NetCDF file (so that they share coordinates)? 
i.e.

x(x)
y(y)

lat(y,x)
lon(y,x)
precip(y,x)
precip:coordinates = "lat lon"
precip:standard_name = "precipitation_amount"
prob(y,x)
prob:coordinates = "lat lon"
prob:standard_name = "precipitation_amount_converted_to_cumulative_probability"

So far we have been planning on storing each variable in a separate file, so 
this possibility didn't occur to me until just now. Is this what you had in 
mind all along?

Regards,

Dan 



-----Original Message-----
From: CF-metadata [mailto:[email protected]] On Behalf Of 
Jonathan Gregory
Sent: 08 September 2014 15:52
To: [email protected]
Subject: Re: [CF-metadata] Return periods

Dear Dan

I see - thanks for the explanation. The F(x) has been calculated not just from
the month concerned, but from a longer period, which you haven't mentioned,
I presume. You are using F(x) as a lookup function, in effect, to convert
precipitation amount to probability. As we said before, I think that F(x)
should generally have a standard name of cumulative_distribution_function_of_
precipitation_amount, but you would expect that to have a coordinate (i.e. an
independent variable) of x=precipitation_amount, which your field will not. You
earlier suggested that x could be an auxiliary coordinate variable, but I don't
think that would be appropriate, because aux coords do not store independent
variables. They depend on the independent variables stored in the coords.

If we don't want to (slightly mis)use cumulative_distribution_function_of_
precipitation_amount for your purpose, I would suggest that we could have
a standard_name of something like precipitation_amount_converted_to_
cumulative_probability for your F(x(lon,lat)). I suggest this to capture the
idea that it's really another way of stating x(lon,lat), to which you have
applied F as a known function. I haven't used the phrase expressed_as, which
occurs in other standard names, because this conversion involves data, not
just well-known constants one can look up.

What do you and others think?

Best wishes

Jonathan

----- Forwarded message from "Hollis, Dan" <[email protected]> -----

> From: "Hollis, Dan" <[email protected]>
> To: "Gregory, Jonathan" <[email protected]>,
>       "[email protected]" <[email protected]>
> Subject: RE: [CF-metadata] Return periods
> Date: Mon, 8 Sep 2014 13:47:49 +0000
> 
> Hi Jonathan,
> 
> We have two 2D fields that we would like to store - the first contains 
> precipitation amount, x, for a specific month (e.g. Aug 2014), and the second 
> contains the return period (or probability) of the precipitation amount in 
> the first field i.e. F(x).
> 
> The relationship between these two fields is at the grid point level i.e. the 
> value at any given grid point in the second field is the value of F(x) 
> corresponding to the value of x at the same location in the first field. Note 
> that the relationship between the two quantities varies with location i.e. 
> the return period of 100 mm in the Scottish mountains is very different to 
> the return period of 100 mm in southeast England.
> 
> As x is a floating point quantity the number of unique values in the first 
> field will essentially equal the number of grid points (i.e. approximately 
> 10000 for the UK). If the second field were to use x as a coordinate variable 
> then the size of this coordinate would be 10000, and for any given grid point 
> in space there would be a value of F(x) for only one value of this coordinate 
> i.e. the 3D field F(x,lat,lon) would be very sparse.
> 
> Looking at it another way, if we were intending to store fields of F(x) for a 
> small number of fixed values of x (e.g. 10 mm, 20 mm, 30 mm, 40 mm etc) then 
> I can see that having x as a coordinate variable would make sense. However 
> what we actually want to do is store F(x) for a single value of x, but where 
> the value of x is different for each location.
> 
> Does that make any sense? I think I have it clear in my own mind but I find 
> it quite hard to describe.
> 
> Dan
> 
> 
> 
> -----Original Message-----
> From: CF-metadata [mailto:[email protected]] On Behalf Of 
> Jonathan Gregory
> Sent: 04 September 2014 17:40
> To: [email protected]
> Subject: Re: [CF-metadata] Return periods
> 
> Dear Dan
> 
> I don't think I have understood this. What is the field of precipitation
> amount? The F(x) would actually be a 3D data variable, I think, F(x,lat,lon),
> which gives the probability that precipitation is less than x at (lat,lon).
> For this field, x is a 1D coord variable, not a field.
> 
> Best wishes
> 
> Jonathan
> 
> 
> > Yes, that is what I had in mind. What slightly concerns me is that I would 
> > effectively end up storing the precipitation amount twice:
> > 
> > - once as a data variable in its own right
> > 
> > - once as an auxilliary coordinate, with F(x) as the data variable
> > 
> > Duplication, especially within the same data archive, seems like something 
> > to be avoided if possible, hence my idea to have F(x) as the auxilliary 
> > coordinate variable and precipitation amount as the data variable and not 
> > store F(x) as a separate data variable. Do you think that it would be 
> > preferable/acceptable to store the precipitation values twice?
> > 
> > Regards,
> > 
> > Dan
> > 
> > 
> > -----Original Message-----
> > From: CF-metadata [mailto:[email protected]] On Behalf Of 
> > Jonathan Gregory
> > Sent: 04 September 2014 14:14
> > To: [email protected]
> > Subject: [CF-metadata] Return periods
> > 
> > Dear Dan
> > 
> > I agree with you that it would be better to store F(x) than to use your sign
> > convention for return periods. However it would be fine to split the return
> > periods into the two tails in different data variables and give them 
> > distinct
> > standard names. We have some standard names for such things e.g.
> >   
> > spell_length_of_days_with_lwe_thickness_of_precipitation_amount_above_threshold
> > and you could propose suitable ones.
> > 
> > If you store F(x), I think it would be a data variable, not a coordinate or
> > ancillary variable, and it should have a standard name. I believe the 
> > guidance
> > you quote is about probability distribution functions rather than cumulative
> > (probability) distribution functions. Following a similar approach, however,
> > we could have a standard name such as
> >   cumulative_distribution_function_of_precipitation_amount
> > for F(x), where x is precipitation_amount, which would be a coordinate. Is
> > that what you have in mind?
> > 
> > Cheers
> > 
> > Jonathan
> > 
> > 
> > ----- Forwarded message from "Hollis, Dan" <[email protected]> 
> > -----
> > 
> > > Dear all,
> > > 
> > > Here is another question related to migrating our UK climate grids to 
> > > NetCDF.
> > > 
> > > As well as grids of the monthly rainfall total (in mm) we also generate 
> > > grids of the estimated return period of the rainfall total (in years). 
> > > Currently these two quantities are stored in separate files (with only 
> > > the file name and location to tell us they are related). I've been trying 
> > > to think how to store the return period information using CF-NetCDF and 
> > > would be grateful for advice.
> > > 
> > > Some further details:
> > > 
> > > Our existing grids contain the return period in years i.e. if the return 
> > > period for a particular grid point is N years then this means that we 
> > > estimate that the rainfall total for that grid point will be exceeded on 
> > > average once every N years. This is equivalent to saying that each year 
> > > there is a probability of 1/N of exceeding that rainfall amount i.e. the 
> > > cummulative distribution function, F(x) = 1 - 1/N. For example, if N = 10 
> > > then F(x) = 0.9. Additionally, as we are also interested in droughts, we 
> > > have adopted our own convention of using negative values to refer to the 
> > > left (dry) tail of the rainfall distribution. For example N = -10 is used 
> > > to mean that F(x) = 0.1 i.e. we estimate that rainfall amounts *less* 
> > > than the observed value will occur once every 10 years on average.
> > > 
> > > This use of positive and negative values to indicate return periods 
> > > relating to the right (wet) and left (dry) tails is convenient but 
> > > unconventional. My initial thought is that we should store F(x) itself 
> > > and only convert to return period for the purposes of presentation e.g. 
> > > creating maps.
> > > 
> > > So, how to store F(x)? The main problem is that the value to which the 
> > > return period relates (i.e. the rainfall amount) varies from one grid 
> > > point to another. Two possibilities occur to me, both of which involve 
> > > storing F(x) alongside the rainfall total:
> > > 
> > > - Store F(x) as an auxilliary coordinate
> > > 
> > > - Store F(x) as ancillary data
> > > 
> > > It's not clear to me whether one is better than the other, or even 
> > > whether either approach is valid.
> > > 
> > > The other question is what to call the F(x) values. The guidance for 
> > > ancillary data says to use standard name modifiers to indicate the 
> > > relationship, but there doesn't seem to be anything suitable for 
> > > describing F(x).
> > > 
> > > The other thing I've looked at is the guidance for constructing standard 
> > > names. I can't seem to locate this on the current CF web site so I've 
> > > refered to the archived copy available here:
> > > 
> > > https://web.archive.org/web/20130728212039/http://cf-pcmdi.llnl.gov/documents/cf-standard-names/guidelines
> > > 
> > > The section on transformations includes 
> > > 'probability_distribution_of_X[_over_Z]' in the list, however it's 
> > > unclear to me whether this is what I need, or even how I might use it in 
> > > other circumstances. The notes state:
> > > 
> > > "probability distribution (i.e. a number in the range 0.0-1.0 for each 
> > > range of X) of variations (over Z) of X. The data variable should have an 
> > > axis for X."
> > > 
> > > The reference to 'each range of X' is the bit I find confusing. Is the 
> > > idea to store F(X1), F(X2), F(X3) etc, or is it intended to be F(X2) - 
> > > F(X1), F(X3) - F(X2), F(X4) - F(X3) etc? The former doesn't quite fit the 
> > > description, but the latter has the problem that the number of ranges (= 
> > > the number of data values) will be one less than the number of X values. 
> > > I can't see any existing names that use this transformation to use as a 
> > > guide.
> > > 
> > > If anyone can help that would be much appreciated.
> > > 
> > > Thanks,
> > > 
> > > Dan
> > > 
> > > 
> > > Dan Hollis   Climatologist
> > > Met Office   Hadley Centre   FitzRoy Road   Exeter   Devon   EX1 3PB   
> > > United Kingdom
> > > Tel: +44 (0)1392 886780   Fax: +44 (0)1392 885681
> > > E-mail: [email protected]   Website: http://www.metoffice.gov.uk
> > > For UK climate and past weather information, visit 
> > > http://www.metoffice.gov.uk/climate
> > > 
> > > 
> > 
> > > _______________________________________________
> > > CF-metadata mailing list
> > > [email protected]
> > > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> > 
> > 
> > ----- End forwarded message -----
> > _______________________________________________
> > CF-metadata mailing list
> > [email protected]
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> 
> ----- End forwarded message -----
> _______________________________________________
> CF-metadata mailing list
> [email protected]
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

----- End forwarded message -----
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Re: [CF-metadata] Return periods

Reply via email to