Hi David,

> I did indeed mean "instead of" rather than "in addition to". Allowing a 
> domain variable reference instead of the usual attributes would neither 
> disallow the usual attributes, nor change their meaning, so no backwards 
> incompatibility. This is similar to the `grid_mapping` extension that was 
> introduced at CF-1.7. In this case the old single grid mapping case was still 
> supported in the new version, but a new syntax was created for multiple grid 
> mappings. This new syntax is not understood by software built on CF-1.6.

Ok, it was confusing because no one talked about using references to domain 
variables **instead of** the usual attributes before, so I thought you were 
replying to my **in addition** question.

> We shouldn't allow a domain variable instead of the usual domain definition 
> because _a)_ there was no use case for it and _b)_ because it would require 
> all software to be rewritten to support this different method.

Agreed.


> Even though allowing this would make it easier, in limited circumstances, to 
> see "by eye" if two data variables shared a domain, I don't think that is a 
> use case on its own. These limited circumstances only arise when informally 
> comparing multiple data variables with domain references _within the same 
> file_ (as opposed to the same dataset). Library software would not generally 
> benefit from this as it has to store the constituent parts of the domain 
> (cell measure, grid mappings, coordinates, etc) regardless of how it was 
> encoded. If a stronger use case were to present itself in the future I would 
> welcome this being reviewed, but suggest that for now we do not allow this.



> With regards the pre-existing redundancy issue, data variables are 
> essentially independent entities. Therefore there is no redundancy if, say, 
> two data variables have the same `coordinates` attribute value.

I get what you mean, but independence achieved by denormalization introduces 
redundancy as soon as two entities have some elements in common, and therefore 
makes the data prone to inconsistency issues. Even if each data variable has 
its own domain instance (i.e. its own set of `coordinates`, `grid_mapping`, 
etc... attributes) , if two or more data variables share the same domain 
(multiple parameters measured by the same instrument for example) then these 
instances of the domain are redundant, I don't see how it could be otherwise.

> We have to trust dataset providers to produce the datasets that they intend, 
> and that is made easier by not allowing the same information to be encoded 
> twice for each data variable. If this were allowed, and the two methods were 
> inconsistent, we have no way of knowing which is correct.

The idea is not to identify which definition is correct but to detect when two 
definitions of the same domain are incompatible or not as complete as they 
could. The goal is to offer a way for data producers to detect errors (multiple 
definitions of a single domain that are not compatible with each other) and 
consistency issues (when two variables share the same domain but one of them 
only provides a minimal definition while the other has a detailed description), 
therefore the means to improve the overall quality of the files they generate 
before these files are distributed to end users.

But again, it was just a suggestion for a small improvement of the proposal, it 
is absolutely not a blocking point for us.

Cheers,

Sylvain









-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/cf-convention/cf-conventions/issues/301#issuecomment-720601089

This list forwards relevant notifications from Github.  It is distinct from 
cf-metad...@cgd.ucar.edu, although if you do nothing, a subscription to the 
UCAR list will result in a subscription to this list.
To unsubscribe from this list only, send a message to 
cf-metadata-unsubscribe-requ...@listserv.llnl.gov.

Reply via email to