#152: Time mean over area fractions which vary with time
-----------------------------+------------------------------
  Reporter:  martin.juckes   |      Owner:  cf-conventions@…
      Type:  enhancement     |     Status:  new
  Priority:  medium          |  Milestone:
 Component:  cf-conventions  |    Version:
Resolution:                  |   Keywords:
-----------------------------+------------------------------
Description changed by martin.juckes:

Old description:

> Following a discussion on the mailing list, I'd like to propose adding a
> new example to the CF Convention document to illustrate the use of
> cell_methods to specify different mean quantities when using a mask which
> is time varying (e.g. sea_ice). The qualifier `where` has been introduced
> into the `cell_methods` to specify masked spatial operations, e.g. `area:
> mean where sea_ice` to represent a spatial mean over sea ice. The current
> convention does not explicitly comment on whether the `where` construct
> can be used with other dimensions. For the CMIP6 data request there is a
> requirement to specify the temporal mean of quantities averaged over sea
> ice, and the spatial extent of the sea ice is generally varying in time.
>
> The proposal is to make it clear that use of `where` for non-spatial
> dimensions is allowed by adding examples in section 7. It is also
> necessary to provide these examples to clarify the subtle differences
> implied by different formulations of the `cell_methods` statement.
>
> == Replace text of section 7.3.3 (first 3 paragraphs) ==
>
> By default, the statistical method indicated by cell_methods is assumed
> to have been evaluated over the entire horizontal area of the cell.
> Sometimes, however, it is useful to limit consideration to only a portion
> of a cell (e.g. a mean over the sea-ice area). The portion concerned is
> constant in time in some cases, but it could be time-varying. Grid cell
> “portions” that can be considered are only those permitted to be
> associated with the   `standard_name` of `area_type`. There are two
> options for indicating when a quantity represents a portion of a cell.
>
> The first method can be used for the common case that the cell_method
> applies to a single area-type. In this case, the cell_methods attribute
> may include a string of the form `name: method where type`. Here name
> could, for example, be area and type may be any of the strings permitted
> for a variable with a standard_name of area_type. As an example, if the
> method is `area: mean where sea_ice`, then the data would represent a
> mean over only the sea ice portion of the grid cell. When this first
> option is adopted, none of the variables in the netCDF file should be
> given a name identical to the string that names the `area_type`.  This
> restriction is imposed so that it will be clear that the metadata should
> not be interpreted following the second option (described in the next
> paragraph), which takes precedence.
>
> The second method for indicating that a statistic applies to only a
> portion of a cell is more general because it can reference multiple area-
> types.  This may be needed when a variable has a dimension that ranges
> across various area types.  In this case, the cell_methods entry is of
> the form `name: method where typevar`. Here `typevar` is a string-valued
> auxiliary coordinate variable or string-valued scalar coordinate variable
> (see Section 6.1, "Labels") with a `standard_name` of `area_type`. The
> variable `typevar` contains the name(s) of the selected portion(s) of the
> grid cell to which the method is applied. This method provides a
> convenient way to store output from land surface models, for example,
> since they deal with many area types within each surface gridbox (e.g.,
> vegetation, bare_ground, snow, etc.).
>
> == Caption of example 7.6
>
> If the method is `mean`, various ways of calculating the mean can be
> distinguished in the `cell_methods` attribute with a string of the form
> `mean where type1 [over type2]`. Here, type1 can be any of the
> possibilities allowed for typevar or type (as specified in the two
> paragraphs preceding above Example). The same options apply to type2,
> except it is not allowed to be the name of an auxiliary coordinate
> variable with a dimension greater than one (ignoring the dimension
> accommodating the maximum string length).
>
> A cell_methods attribute with a string of the form `area: mean where
> type1 over type2` indicates the mean is calculated by integrating over
> the type1 portion of the cell and dividing by the area of the type2
> portion.  When `over type2` is omitted, it is assumed to be the same as
> type1.
>
> == Clarification at start of section 7.3.3 (not needed if above is
> accepted) ==
>
> ''Add a clarification after this sentence in the first paragraph of 7.3.3
> "Sometimes, however, it is useful to limit consideration to only a
> portion of a cell (e.g. a mean over the sea-ice area)", to introduce the
> idea of time-varying area fractions:''
>
> The portion concerned is constant in time in many cases, but it could be
> time-varying.
>
> == New example for time-varying area fractions ==
>
> ''The following new example and explanatry text should be added in
> section 7.3.3:''
>
> Example 7.8: Time mean over area fractions which vary with time
>
> {{{
> float simple_mean(lat,lon):
>    simple_mean:cell_methods: area: mean where sea_ice time: mean
>
> float weighted_mean(lat,lon):
>    weighted_mean:cell_methods: area: time: mean where sea_ice
>
> float partial_mean(lat,lon):
>    partial_mean:cell_methods: area: mean where sea_ice over sea time:
> mean
> }}}
>
> When the area fraction is varying with time, there are several different
> ways in which a time mean can be formulated. Three of these are
> illustrated in this example. Suppose, for instance, we are averaging over
> three time steps and the data at one grid point is -10, -6, -2 with area
> fractions .75, .50, .25. The values of the simple_mean, weighted_mean and
> partial mean are, respectively, (-10 -6 -2)/3 = -6, (-10*.75 - 6*.5
> -2*.25)/(.75+.5+.25) = -7.33 , and (-10*.75 - 6*.5 -2*.25)/3 = -3.667.
> The partial mean provides the contribution to the mean over the entire
> grid from a specified area type. The simple mean is weighting each time
> period equally, while the weighted mean provides equal weighting to each
> unit area of `sea_ice`.
>
> In example 7.8, `time` could be replaced by any other coordinate over
> which an average is taken, such as an ensemble index.

New description:

 Following a discussion on the mailing list, I'd like to propose adding a
 new example to the CF Convention document to illustrate the use of
 cell_methods to specify different mean quantities when using a mask which
 is time varying (e.g. sea_ice). The qualifier `where` has been introduced
 into the `cell_methods` to specify masked spatial operations, e.g. `area:
 mean where sea_ice` to represent a spatial mean over sea ice. The current
 convention does not explicitly comment on whether the `where` construct
 can be used with other dimensions. For the CMIP6 data request there is a
 requirement to specify the temporal mean of quantities averaged over sea
 ice, and the spatial extent of the sea ice is generally varying in time.

 The proposal is to make it clear that use of `where` for non-spatial
 dimensions is allowed by adding examples in section 7. It is also
 necessary to provide these examples to clarify the subtle differences
 implied by different formulations of the `cell_methods` statement.


 == 1. Replace text of section 7.3.3 (first 3 paragraphs) ==

 By default, the statistical method indicated by cell_methods is assumed to
 have been evaluated over the entire horizontal area of the cell.
 Sometimes, however, it is useful to limit consideration to only a portion
 of a cell (e.g. a mean over the sea-ice area). The portion concerned is
 constant in time in some cases, but it could be time-varying. Grid cell
 “portions” that can be considered are only those permitted to be
 associated with the   `standard_name` of `area_type`. There are two
 options for indicating when a quantity represents a portion of a cell.

 The first method can be used for the common case that the cell_method
 applies to a single area-type. In this case, the cell_methods attribute
 may include a string of the form `name: method where type`. Here name
 could, for example, be area and type may be any of the strings permitted
 for a variable with a standard_name of area_type. As an example, if the
 method is `area: mean where sea_ice`, then the data would represent a mean
 over only the sea ice portion of the grid cell. When this first option is
 adopted, none of the variables in the netCDF file should be given a name
 identical to the string that names the `area_type`.  This restriction is
 imposed so that it will be clear that the metadata should not be
 interpreted following the second option (described in the next paragraph),
 which takes precedence.

 The second method for indicating that a statistic applies to only a
 portion of a cell is more general because it can reference multiple area-
 types.  This may be needed when a variable has a dimension that ranges
 across various area types.  In this case, the cell_methods entry is of the
 form `name: method where typevar`. Here `typevar` is a string-valued
 auxiliary coordinate variable or string-valued scalar coordinate variable
 (see Section 6.1, "Labels") with a `standard_name` of `area_type`. The
 variable `typevar` contains the name(s) of the selected portion(s) of the
 grid cell to which the method is applied. This method provides a
 convenient way to store output from land surface models, for example,
 since they deal with many area types within each surface gridbox (e.g.,
 vegetation, bare_ground, snow, etc.).

 == 2. Caption of example 7.6

 If the method is `mean`, various ways of calculating the mean can be
 distinguished in the `cell_methods` attribute with a string of the form
 `mean where type1 [over type2]`. Here, type1 can be any of the
 possibilities allowed for typevar or type (as specified in the two
 paragraphs preceding above Example). The same options apply to type2,
 except it is not allowed to be the name of an auxiliary coordinate
 variable with a dimension greater than one (ignoring the dimension
 accommodating the maximum string length).

 A cell_methods attribute with a string of the form `area: mean where type1
 over type2` indicates the mean is calculated by integrating over the type1
 portion of the cell and dividing by the area of the type2 portion.  When
 `over type2` is omitted, it is assumed to be the same as type1.

 == 3. Addtional text for masks which vary over additional dimensions (e.g.
 time) .. form proposed by Karl ==

 When the “where” construct is used, and when “area” is not the only
 “dimension” to which it applies, the interpretation more generally is that
 a “weighted” mean is reported. Specifically, the quantity of interest is
 integrated over the additional dimension(s) with weights proportional to
 the fraction of “type1” area_type that exists, and then this is divided by
 the integral over the same dimension(s) of the fraction of “type2”
 area_type that exists.   [Note that certain variables might be undefined
 if the fraction of the area_type considered is 0; for example the
 temperature of sea ice is not defined if there is no sea ice.  In this
 case, a time-mean value can still be computed for cells containing some
 sea ice during at least a portion of the averaging interval because no
 matter what the value assumed for temperature when sea ice is missing,
 those values are given zero weight in computing the time-mean.]

 Note that "`all_area_types” is one of the valid strings permitted for a
 variable with the standard_name area_type,  so a cell_methods string of
 the form “area: mean over type1 where all_area_types” indicates the mean
 is calculated by integrating over the type1 portion of the grid cell and
 dividing by the entire area of the grid cell.

 The following three examples illustrate cases when one might want to use
 “where” or “where … over” in defining the cell_methods:

 1. Suppose that in a grid cell the fractional sea ice varies over time,
 but there is interest in the time-mean surface temperature of the sea ice.
 The time-samples, each representing a spatially-averaged sea ice
 temperature can be summed and then divided by the number of samples to
 obtain an unweighted mean where sea ice exists. This would be indicated
 with:
 cell_methods = “area: mean where sea_ice time: mean”

 2. Suppose there is interest in recording the mean fractional area covered
 by sea ice and the mean sea ice thickness in such a way that their product
 would equal the time-mean volume of sea ice in each grid cell. In this
 case the sea ice area would be reported as an unweighted time-mean, while
 the mean sea ice thickness would be calculated with time samples weighted
 by the fractional area of sea ice. Thus, for sea ice thickness:
 cell_methods = “area: time: mean where sea_ice”

 3. Suppose the time-mean contributions to total heat flux from different
 portions of a grid cell (e.g., ice-free and ice-covered) are of interest,
 and there are reasons to report these in such a way that the total heat
 flux is the sum of the individual contributions. Then the cell_methods
 attribute would be defined:
 cell_methods=”area: mean where sea_ice over all_area_types time: mean”

 In some cases a variable referencing a specific area_type will actually be
 defined even in the absence of that area_type (i.e., over the entire grid
 cell).  Consider the surface_snow_thickness, which could sensibly be
 considered to be 0 in the absence of snow.  In this case one might in some
 instances want to report “area: time: mean where snow” (giving a measure
 of the typical snow depth when snow exists) and in other instances “area:
 time: mean where snow over all_area_types” (which in this case would be
 identical to “area: time: mean”) or “area: time: mean where snow over
 land”.


 == 4. Clarification at start of section 7.3.3 (not needed if above is
 accepted) ==

 ''Add a clarification after this sentence in the first paragraph of 7.3.3
 "Sometimes, however, it is useful to limit consideration to only a portion
 of a cell (e.g. a mean over the sea-ice area)", to introduce the idea of
 time-varying area fractions:''

 The portion concerned is constant in time in many cases, but it could be
 time-varying.

 == 5. New example for time-varying area fractions ==

 ''The following new example and explanatry text should be added in section
 7.3.3:''

 Example 7.8: Time mean over area fractions which vary with time

 {{{
 float simple_mean(lat,lon):
    simple_mean:cell_methods: area: mean where sea_ice time: mean

 float weighted_mean(lat,lon):
    weighted_mean:cell_methods: area: time: mean where sea_ice

 float partial_mean(lat,lon):
    partial_mean:cell_methods: area: mean where sea_ice over sea time: mean
 }}}

 When the area fraction is varying with time, there are several different
 ways in which a time mean can be formulated. Three of these are
 illustrated in this example. Suppose, for instance, we are averaging over
 three time steps and the data at one grid point is -10, -6, -2 with area
 fractions .75, .50, .25. The values of the simple_mean, weighted_mean and
 partial mean are, respectively, (-10 -6 -2)/3 = -6, (-10*.75 - 6*.5
 -2*.25)/(.75+.5+.25) = -7.33 , and (-10*.75 - 6*.5 -2*.25)/3 = -3.667. The
 partial mean provides the contribution to the mean over the entire grid
 from a specified area type. The simple mean is weighting each time period
 equally, while the weighted mean provides equal weighting to each unit
 area of `sea_ice`.

 In example 7.8, `time` could be replaced by any other coordinate over
 which an average is taken, such as an ensemble index.

--

--
Ticket URL: <http://cf-trac.llnl.gov/trac/ticket/152#comment:18>
CF Metadata <http://cf-convention.github.io/>
CF Metadata

Reply via email to