Dear Ros Thanks a lot for working on this. I think you have correctly identified parts which are stated as requirements and recommendations, but you have to put yourself in the shoes of the CF-checker, and consider what you can actually *do* to make the checks. The CF-checker, for instance, does not know what we mean by element and instance dimensions, in the first statement:
"In the multidimensional array representations, data variables have both an instance dimension and an element dimension. The dimensions may be given in any order. If there is a need for either the instance or an element dimension to be the netCDF unlimited dimension (so that more features or more elements can be appended), then that dimension must be the outer dimension of the data variable i.e. the leading dimension in CDL." To check this statement, I think we have first to refer to the table in 9.1, which implies a lot of checks on dimensions and coordinates. For instance, if the featureType att says it's a timeseries, the table says that the data is logically 1D (i), and there are mandatory coord or aux coord vars with the logical structure x(i) y(i) t(i,o). There are five possibilities for storing a collection of timeseries features. These are alternative sets of requirements for the CF-checker, and one of these sets must be satisfied: * Single timeseries: The data variable is 1D (the element dimension). It has a coord var or a 1D aux coord var of time. It has two scalar coord vars of horizontal position. * Orthogonal multidimensional representation: The data variable is 2D. One of its dimensions (the element dimension) has a coordinate variable or a 1D auxiliary coordinate variable of time. The other one (the instance dimension) has two coordinate or 1D aux coord variables of horizontal position. * Incomplete multidimensional rep: The data variable is 2D. It has a 2D aux coord var of time with the same dimensions as itself. One of the dimensions has two coordinate or 1D aux coord variables of horizontal position. * Contiguous ragged array rep: The data variable is 1D. It has a 1D aux coord var of time with this dimension. There is a variable in the file (the count variable) with a sample_dimension att that names the dimension of the data variable. The data variable has two 1D aux coord vars of horizontal position, whose dimension is the dimension of the count variable. * Indexed ragged array rep: The data variable is 1D. It has a 1D aux coord var of time with this dimension. There is an variable in the file (the index variable) with the same dimension as the data variable and which has an index_dimension att. The data variable has two 1D aux coord vars of horizontal position with the dimension is named by the instance_dimension att of the index variable. This is a bottom-up approach, but that's what the checker has to do, isn't it. When you find that one of these cases matches what you have, it allows you formally to identify the instance and element dimensions, and the count and index variables if relevant. Do you see what I mean? It would be necessary to work through the other cases in a similar way, but it would take a lot of space to write them all down in the conformance rules in the way I have done above. Perhaps there would be a way to summarise the principles on which the checker would work from the table. We did not say exactly what constitutes a horizontal coordinate. I propose that if two horizontal coord or aux coord vars are required, they should be longitude and latitude, or grid_longitude and grid_latitude, or projection_x_coordinate and projection_y_coordinate, or *any* pair of coordinates if one of them has axis='X' and the other axis='Y'. It is allowed to provide more than one of these pairs, but it is not allowed (for instance) to supply only longitude and grid_latitude, which don't form a pair. (It is always OK to supply coordinates in addition to those which are mandatory. These can just be ignored by the checker.) Once these identifications have been made, other checks can be applied: H.2 It is recommended that there is be a variable with cf_role of timeseries_id. If there is such a variable, it must have the instance dimension. All the values of this variable must be different. (These are for timeseries. Corresponding but different rules would apply for profiles and trajectories. Appendix H suggests others, such as it is recommended that there should be station variables with standard_name attributes "platform_name", "surface_altitude" and "platform_id" when applicable, from H.5.) 9.3 A count variable or an index variable must be integer. 9.3 Negative values (except missing data) are not allowed in a count variable. The sum of the non-missing values must not exceed the dimension of the data variable. 9.3 Negative values (except missing data) are not allowed in an index variable. None of the values may be greater than or equal to the dimension of the data variable (because they must be valid indices). All of the non-missing values must be different. You say that the featureType att is required. This is a tricky one. We don't know it's required unless we know we have a discrete sampling geometry that requires it, but we don't know it's a discrete sampling geometry for sure unless there is a featureType. I suggest you look for the featureType first, and only apply the checks for sect 9 if it is present, except that you could give an error if there is no featureType and there is a count or index var in the file. That must indicate a ragged rep, and they require featureType. 9.6 Where any auxiliary coordinate variable contains a missing value, all other coordinate, auxiliary coordinate and data values corresponding to that element should also contain missing values. 9.6 Where the instance variable identified by cf_role contains a missing value indicator, all other instance variable should also contain missing values corresponding to that element. Best wishes Jonathan _______________________________________________ CF-metadata mailing list [email protected] http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
