Re: [CF-metadata] original_ensemble_size

Karl Taylor Thu, 23 Jul 2015 17:42:58 -0700

Hi all,

This addresses the issue of how to associate an ensemble size with avariable. It also suggests an alternate way of proceeding that is moregeneral and will allow us to record, for example, which models wereincluded in a multi-model mean.


First to consider Jim's suggestion:

I agree with Jim that you might want to indicate which member (ormembers) of an ensemble were represented by the variable so you mightwant to include a coordinate variable of "realization". You could thenalso define an *attribute* of that coordinate as "ensemble_size" whichwould record the size, but currently that approach is not standardized(but of course is permitted) by our conventions.


Now Mark's suggestion:

Mark's alternative approach to make "ensemble_size" a coordinatevariable (presumably in addition to possibly including "realization")would also relate it to the variable of interest, but this would be abit unconventional since a variable would normally be considered to be a*function* of its (independent) coordinates. I don't thinkT(x,realization,ensemble_size) is a proper function, since T depends onx and realization, but should be independent of ensemble size in most cases.


Jonathan's suggestion:

I think Jonathan suggested including ensemble_size in a cell_methodsattribute. For example


dimensions:
    lon=72
    lat=96
    e_size=5

variables:
    float precip(lon,lat)
        precip: cell_methods="realization: point (sample_size: e_size)

where because "realization" is a standard name, it does not need to beexplicitly declared with a "coordinates" attribute. Jonathan originallyused "dimension" rather than "sample_size", but I prefer"sample_size". If this approach were followed, then CF would need tobe modified so that "sample_size" (along with "interval") was designatedto be one of the options for providing "standardized" extra informationin the cell_methods attribute. Note that the variable "pointed to" byoriginal_domain would not necessarily be a coordinate variable; it neednot be monotonic and it could be a character variable (i.e., a list).


Alternative "new approach"

An approach that is a slight variant on Jonathan's and would allow evenmore information to be provided concerning the ensemble is illustratedby the following example:


dimensions:
    lon=72
    lat=96
    members=5

variables:
    float precip(lon,lat)
        precip: cell_methods="member: point (sample_pool: members)
    int member
        member: standard_name="realization"
     int members(members)
        members: standard_name="realization"

data:
    member = 3
    members = 1, 3, 5, 6, 10

This would tell you T was from the realization labeled 3 of a 5-memberensemble (with labels 1, 3, 5, 6, and 10). If this approach wereadopted, then CF would need to be modified so that "sample_pool" (alongwith "interval") was designated to be one of the the options forproviding "standardized" extra information in the cell_methods attribute.

Under Jonathan's approach and also the "new approach", there wouldn't bea need to define the standard_name "ensemble_size" because that would beprovided by the dimension size (5 in the above).

Note that the new approach could also be used to record a multi-modelensemble mean (I'm not absolutely sure this example complies with thecurrent convention, but I think it would if the option to designate the"original_domain" were added to CF):


dimensions:
    lon=72
    lat=96
    models=5
    max_len = 10

variables:
    float precip(lon,lat)
        precip: cell_methods="realization: mean (sample_pool: models)
     char models(models, max_len)

data:
    models = "CanESM2", "CESM1", "CNRM-CM5", "HadGEM2", "MIROC-ESM"

Note also that the flexibility of this new approach could be useful fordimensions other than realization when, for example, the samplinginterval for a spatial mean is from scattered stations. If one werecomputing an spatial mean from 5 stations, for example, this could berecorded as follows:


dimensions:
    stations=5
    max_len=16

variables:
    float precmean
        precmean: cell_methods="area: mean (sample_pool: stations)"
    char stations(stations,max_len)
        stations: coordinates="lat lon"
    lat(stations)
        lat: standard_name="latitude"
    lon(stations)
        lon: standard_name="longitude"

data:

stations = "Oakland", "San Francisco", "Livermore", "San Jose","Palo Alto"

    lat = 37.62, 37.77, ...
    lon = -122.27, -122.42, ....

I would find it very nice to be able to specify the models contributingto a multi-model mean using the above approach. Anyone else think so?It would also satisfy Mark's use case of wanting to record the size ofthe ensemble.


Best regards,
Karl

_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Re: [CF-metadata] original_ensemble_size

Reply via email to