Dear all I've studied the text of proposed changes to Sect 8, as someone not at all involved in writing it or using these kinds of technique. (It's easier to read the files in [Daniel's repo](https://urldefense.us/v3/__https://github.com/erget/cf-conventions/blob/lossy-compression-through-coordinate-sampling/ch08.adoc__;!!G2kpM7uM-TzIFchu!iGX9_YVR65h2_BfQd8IiUVBizbXiSyxxYn9oae_IQantYdaz7fVqGPaJVaXHsSUjt-QNBr8-eT4$ ) than [the pull request](https://urldefense.us/v3/__https://github.com/cf-convention/cf-conventions/pull/326/files?short_path=ebcafde*diff-ebcafde998cd56873e594e76a10c8541235a4fed3d4664a9c7733805bff39a4c__;Iw!!G2kpM7uM-TzIFchu!iGX9_YVR65h2_BfQd8IiUVBizbXiSyxxYn9oae_IQantYdaz7fVqGPaJVaXHsSUjt-QNh6J1SNs$ ) in order to see the diagrams in place.) I think it all makes sense. It's well-designed and consistent with the rest of CF. Thanks for working it out so thoughtfully and carefully. The diagrams are very good as well.
I have not yet reviewed Appendix J or the conformance document. I'm going to be on leave next week, so I thought I'd contribute just this part before going. Best wishes Jonathan There is one point where I have a suggestion for changing the content of the proposal, although probably you've already discussed this possibility. If I understand correctly, you must always have both the `tie_point_dimensions` and `tie_point_indices` attributes of the interpolation variable, and they must refer to the same tie point dimensions. Therefore I think a simpler design, easier for the both data-writer and data-reader to use, would combine these two attributes into one attribute, whose contents would be "*interpolation_dimension*`:` *tie_point_interpolation_dimension* *tie_point_index_variable* [*interpolation_zone_dimension*] [*interpolation_dimension*`:` ...]". Also, I have some suggestions for naming: * If you adopt my suggestion for a single attribute to replace `tie_point_dimensions` and `tie_point_indices`, an obvious name for it would be `tie_points`. You've used that name for the attribute of the data variable. However, I would suggest that the attribute of the data variable could equally well be called `interpolation`, since it names the interpolation variable, and signals that interpolation is to be used. * Your terminology has "tie point interpolation dimension" and "interpolation dimension", but the former is not a special case of the latter. That could be confusing, in the same way that (unfortunately) in CF terminology an auxiliary coordinate variable is not a special kind of coordinate variable. I suggest you rename "tie point interpolation dimension" as e.g. "tie point reduced dimension" to avoid this misunderstanding. * A similar possible confusion is that a tie point index variable is not a special kind of tie point variable. To avoid this confusion and add clarity, I suggest you could rename "tie point variable" as "tie point coordinate variable". * The terms "interpolation zone" and "interpolation area" are unhelpful because it's not obvious from the words which one is bigger, so it's hard to remember. If you stick with "zone" for the small one, for area it would be better to use something which is more obviously much bigger, such as "province" or "realm"! Or perhaps you could use "division" or "department", since the defining characteristic is the discontinuity. In the first paragraph of Sect 8 we distinguish three methods of reduction of datset size. I would suggest minor clarifications: > There are three methods for reducing dataset size: packing, lossless > compression, and lossy compression. By packing we mean altering the data in a > way that reduces its precision **(but has no other effect on accuracy)**. By > lossless compression we mean techniques that store the data more efficiently > and result in no **loss of precision or accuracy**. By lossy compression we > mean techniques that store the data more efficiently **and retain its > precision** but result in some loss in accuracy. Then I think we could start a new paragraph with "Lossless compression only works in certain circumstances ...". By the way, isn't it the case that HDF supports per-variable gzipping? That wasn't available in the old netCDF data format for which this section was first written, so it's not mentioned, but perhaps it should be now. There are a few points where I found the text of Sect 8.3 possibly unclear or difficult to follow: * "This form of compression may also be used on a domain variable with the same effect." I think this is an unclear addition. If I understand you correctly, insead of this final sentence you could begin the paragraph with "For some applications the coordinates of a data variable or a domain variable can require considerably more storage than the data in its domain." * Tie Point Dimensions Attribute. If you adopt my suggestion above, this subsection would change its name to "Tie points attribute". It would be good to begin the section by saying what the attribute is for. As it stands, it plunges straigjt into details. The second sentence in particular, about interpolation zones, bewildered me - I didn't know what it was talking about. * I follow this sentence: "For instance, interpolation dimension dimension1 could be mapped to two different tie point interpolation dimensions with dimension1: tp_dimension1 dimension1: tp_dimension2." But I don't understand the next sentence: "This is necessary when different tie point variables for a particular interpolation dimension do not contain the same number of tie points, and therefore define different numbers of interpolation zones, as is the case in Multiple interpolation variables with interpolation parameter attributes." The situation described does not occur in the example quoted, I think. I wonder if it should say, "This occurs when data variables that share an interpolation dimension and interpolation variable have different tie points for that dimension." * Instead of "A tie point variable must span at most one of the tie point interpolation dimensions associated with a given interpolation dimension." I would add a sentence to the first para of "Interpolation and non-interpolation dimension", which I would rewrite as follows: > For each interpolation variable identified in the tie_points attribute, all > the associated tie point variables must share the same set of one or more > dimensions. Each of the dimensions of a tie point variable must be either a > dimension of the data variable, or a dimension of which is to be interpolated > to a dimension of the data variable. A tie point variable must not have more > than one dimension corresponding to any given dimension of the data variable, > and may have fewer dimensions than the data variable. Dimensions of the tie > point variable which are interpolated are called tie point reduced > dimensions, and the corresponding data variable dimensions are called > interpolation dimensions, while those for which no interpolation is required, > being the same in the data variable and the tie point variable, are called > non-interpolation dimensions. The size of a tie point reduced dimension must > be less than or equal to the size of the corresponding interpolation > dimension. * In one place, you say "For each interpolation dimension, the number of interpolation zones is equal to the number of tie points minus the number of interpolation areas," and in another place, "An interpolation zone must span at least two points of each of its corresponding interpolation dimensions." It seems to me that "at least" is wrong - it should be "exactly two". * "The dimensions of an interpolation parameter variable must be a subset of zero or more **of** the ...". * I suggest a rewriting of the part about the dimensions of interpolation paramater variable, for clarity, if I've understood it correctly, as follows: > Where an interpolation zone dimension is provided, the variable provides a > single value along that dimension for each interpolation zone, assumed to be > defined at the centre of interpolation zone. > Where a tie point reduced dimension is provided, the variable provides a > value for each tie point along that dimension. The value applies to the two > interpolation zones on either side of the tie point, and is assumed to be > defined at the interpolation zone boundary (figure 3). > In both cases, the implementation of the interpolation method should assume > that an interpolation parameter variable applies equally to all interpolation > zones along any interpolation dimension which it does not span. * For "The bounds of a tie point must be the same as the bounds of the corresponding target grid cells," I would suggest, "The bounds of a tie point must be the same as the bounds of the target grid cells whose coordinates are specified as the tie point." * I don't understand this sentence: "In this case, though, the tie point index variables are the identifying target domain cells to which the bounds apply, rather than bounds values themselves." A tie point index variable could not possibly contain bounds values. * In Example 8.5, you need only one (or maybe two) data variables since they're all the same in structure. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://urldefense.us/v3/__https://github.com/cf-convention/cf-conventions/issues/327*issuecomment-859397744__;Iw!!G2kpM7uM-TzIFchu!iGX9_YVR65h2_BfQd8IiUVBizbXiSyxxYn9oae_IQantYdaz7fVqGPaJVaXHsSUjt-QNqT8-6I4$ This list forwards relevant notifications from Github. It is distinct from [email protected], although if you do nothing, a subscription to the UCAR list will result in a subscription to this list. To unsubscribe from this list only, send a message to [email protected].
