Tools and CF compliance Was: CF Conventions 1.2 (Jon Blower)
Hi,
concerning compliance and the CF compliance checker: In my view there are
two very different ways how this can/should be interpreted:
(1) check whether (or to what extent) a given file adheres to the convention.
This assumes that the file does in theory adhere to the convention and can be
quite useful to detect errors.
(2) support the development of scripts, CMOR tables, etc. which shall generate
CF compliant files. Here, it cannot be expected a priori that the file adheres
to CF at all (including the missing "Convention" attribute), so the tool should
produce hints to the developer as to what changes he/she should make. While of
course most CF attributes are "optional", individuals and projects should
nevertheless strive to implement a good part of them. Thus, a good "compliance
test" would go beyond critizing what is there but wrong and notify the user of
what is not there but should perhaps be added.
At present the CF checker operates under principle number 1, but in order to
proliferate CF I would suggest to consider some way of going towards variant 2.
I believe that most of the testng that is necessary for this is embedded in the
CF checker anyway, so it would probably be mostly a matter of some program
logic and generation of verbose messages. Perhaps this can be realized on the
web interface with a simple check box:
[ ] suggest further improvements to the file's attribute structure
Best regards,
Martin
< Dr. Martin G. Schultz, ICG-2, Forschungszentrum Jülich >
< D-52425 Jülich, Germany >
< ph: +49 (0)2461 61 2831, fax: +49 (0)2461 61 8131 >
< email: [EMAIL PROTECTED] >
< web: http://www.fz-juelich.de/icg/icg-2/m_schultz >
----------------------------------------------------------------------
Message: 1
Date: Thu, 08 May 2008 11:53:44 +0100
From: Philip Bentley <[EMAIL PROTECTED]>
Subject: Re: [CF-metadata] CF Conventions 1.2
To: [email protected]
Message-ID:
<[EMAIL PROTECTED]>
Content-Type: text/plain; charset="us-ascii"
Hi Jonathan, Ethan,
> Dear Ethan
>
> I think that the current rules are a good compromise between the needs
> of people who write and analyse data, and the needs of developers of
> analysis and other software. The former group of people would like CF
> to be modified fairly rapidly, when they are about to start producing
> data from a project, and they want that data to have proper metadata.
> As you will have seen from previous discussions, our discussions are
> too slow as it is sometimes. Hence we decided the rules so that changes could
> be made, but marked as provisional.
Indeed. I think the 4-plus years between CF 1.0 and CF 1.1 - according to the
date stamps on the documents - says it all. Perhaps the recent flurry of CF
proposal activity in part reflects a general desire to 'play catch-up'.
>
> For provisional changes to become permanent depends on at least two
> applications supporting them. That requires some development effort to
> be invested. CF doesn't have staff resources of its own to commit to
> it. I think the most likely applications to make changes first are the
> cf-checker and libcf. It will be interesting to see how long it takes
> for the changes so far agreed to be implemented in these or other
> applications.
>
> I fear that if we followed this approach:
>
> > 2) Don't add changes to the upcoming version of the specification
> > document "until at least two applications have successfully
> > interpreted the test data".
>
> development of CF would effectively be halted altogether. It would be
> impossible for writers of data to agree changes to the CF standard on
> a short enough timescale. Consequently they would bypass CF, and write
> and analyse data with their own metadata conventions, and the
> usefulness of CF in providing a common standard would be undermined.
I agree 100% with this. If, as a community, we set the barrier to progress [of
the CF conventions] too high then people will necessarily devise local,
incompatible solutions - not out of willfulness, but simply to meet project
deadlines.
>
> Applications don't have to keep entirely up to date, do they? I think
> the value of the Conventions attribute should be that it is easy to be
> clear about what conventions are being implemented in data and metadata.
>
> I agree about the test data. We should construct a file which contains
> some test data for the changes of CF 1.2. (The changes of CF 1.1 did
> not introduce any new attribute.) We'll need a place to deposit such files.
> As moderator of that ticket, I'll discuss it with Phil and Velimir.
I can produce some simple test files for the changes at CF 1.2. But the
question of what constitutes application conformance is, I suggest, not easily
defined. For instance, I could create a noddy netcdf file with two new grid
mapping attributes, as follows:
float temperature(t,z,lat,lon);
:grid_mapping = "crs";
char crs;
:grid_mapping_name = "latitude_longitude";
:semi_major_axis = "92389234"; // new at CF 1.2
:semi_minor_axis = "78682347"; // new at CF 1.2
And I could read this file today using, say, ncdump and ncview. Which clearly
doesn't tell us much. Yet a proposer of a given CF change cannot force the
hands of software developers to produce compliant software within a particular
time frame, if at all. In some (many?) circumstances I think we have to take it
as an act of faith that a particular update to the CF convention will be
advantageous. Plus I believe that the robustness of the CF peer review and
challenge mechanisms is sufficient to ensure that those updates will be
advantageous.
Regards,
Phil
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20080508/c0ac8bd7/attachment-0001.html
------------------------------
Message: 2
Date: Thu, 8 May 2008 12:04:14 +0100
From: Jonathan Gregory <[EMAIL PROTECTED]>
Subject: [CF-metadata] CF Conventions 1.2
To: [email protected]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=us-ascii
Dear Phil
> Perhaps the recent
> flurry of CF proposal activity in part reflects a general desire to
> 'play catch-up'.
Yes, I think that is the case. It certainly is the case for the two proposals I
have made, on the axis and cell_methods attributes. These were discussed on the
email list and in abeyance for a long time because we had no way to adopt them
formally until we agreed the new rules.
> I can produce some simple test files for the changes at CF 1.2. But
> the question of what constitutes application conformance is, I
> suggest, not easily defined. For instance, I could create a noddy
> netcdf file with two new grid mapping attributes, as follows:
Yes, I think such a file would be useful, because it does at least provide
input data that the cf-checker can check for conformance, and other
applications could likewise check that they can read in and interpret, if they
are interested in these features. I agree with you that what "compliance"
actually means for an application is ill-defined. This is an issue which has
come up before, of course. Since most of CF is optional, in one sense (but not
a very useful sense) an application is compliant even if it ignores all that
optional metadata. On the other hand I am sure no application currently exists
which interprets all the metadata. But I don't think that means the metadata is
not useful. It can still be read by humans, it describes the data properly, and
we only add features when people have a need for them (usually people who
intend to produce data).
Best wishes
Jonathan
------------------------------
Message: 3
Date: Thu, 8 May 2008 13:23:02 +0100
From: "Jon Blower" <[EMAIL PROTECTED]>
Subject: [CF-metadata] Tools and CF compliance Was: CF Conventions 1.2
To: "Philip Bentley" <[EMAIL PROTECTED]>
Cc: [email protected]
Message-ID:
<[EMAIL PROTECTED]>
Content-Type: text/plain; charset=ISO-8859-1
Hi Philip and list,
(I've started a new thread as this is probably a new topic for discussion.)
> And I could read this file today using, say, ncdump and ncview. Which
> clearly doesn't tell us much.
This is a really important point. It would be very difficult, in the general
case, to ascertain whether a certain piece of software actually interprets a
certain CF attribute correctly. Conversely it is perhaps unreasonable to
expect a piece of software to implement correctly every feature of a certain CF
version.
What a tool user really wants to know (I think) is, for a given NetCDF file,
which attributes in the file are correctly interpreted by the tool. I can't
think of a neat way to do this - perhaps tool developers could publish a list
of attributes that they claim to be able to interpret for each version of the
tool they produce? A given tool might then implement 100% of CF1.0 but 50% of
CF1.2 for example.
Then the CF community could maintain a list of tools that users could go to to
find out which tools might be most suited to their purpose.
An add-on to the CF compliance checker could be created that, having scanned a
file for CF attributes, produces a list that says "Tool X understands all of
the attributes in this file, but Tool Y only understands 7 out of 9".
All this requires effort of course, but I think it's useful to consider what we
really mean when we call for "CF compliance". How can we help users to judge
which tools they should use and how can we help data providers to ensure that
their data can be interpreted by a wide community?
Jon
On Thu, May 8, 2008 at 11:53 AM, Philip Bentley <[EMAIL PROTECTED]> wrote:
>
> Hi Jonathan, Ethan,
>
>
> Dear Ethan
>
> I think that the current rules are a good compromise between the needs
> of people who write and analyse data, and the needs of developers of
> analysis and other software. The former group of people would like CF
> to be modified fairly rapidly, when they are about to start producing
> data from a project, and they want that data to have proper metadata.
> As you will have seen from previous discussions, our discussions are
> too slow as it is sometimes. Hence we decided the rules so that
> changes could be made, but marked as provisional.
>
> Indeed. I think the 4-plus years between CF 1.0 and CF 1.1 -
> according to the date stamps on the documents - says it all. Perhaps
> the recent flurry of CF proposal activity in part reflects a general desire
> to 'play catch-up'.
>
> For provisional changes to become permanent depends on at least two
> applications supporting them. That requires some development effort to
> be invested. CF doesn't have staff resources of its own to commit to
> it. I think the most likely applications to make changes first are the
> cf-checker and libcf. It will be interesting to see how long it takes
> for the changes so far agreed to be implemented in these or other
> applications.
>
> I fear that if we followed this approach:
>
> > 2) Don't add changes to the upcoming version of the specification
> > document "until at least two applications have successfully
> > interpreted the test data".
>
> development of CF would effectively be halted altogether. It would be
> impossible for writers of data to agree changes to the CF standard on
> a short enough timescale. Consequently they would bypass CF, and write
> and analyse data with their own metadata conventions, and the
> usefulness of CF in providing a common standard would be undermined.
>
> I agree 100% with this. If, as a community, we set the barrier to
> progress [of the CF conventions] too high then people will necessarily
> devise local, incompatible solutions - not out of willfulness, but
> simply to meet project deadlines.
>
> Applications don't have to keep entirely up to date, do they? I think
> the value of the Conventions attribute should be that it is easy to be
> clear about what conventions are being implemented in data and metadata.
>
> I agree about the test data. We should construct a file which contains
> some test data for the changes of CF 1.2. (The changes of CF 1.1 did
> not introduce any new attribute.) We'll need a place to deposit such files.
> As moderator of that ticket, I'll discuss it with Phil and Velimir.
>
> I can produce some simple test files for the changes at CF 1.2. But
> the question of what constitutes application conformance is, I
> suggest, not easily defined. For instance, I could create a noddy
> netcdf file with two new grid mapping attributes, as follows:
>
> float temperature(t,z,lat,lon);
> :grid_mapping = "crs";
> char crs;
> :grid_mapping_name = "latitude_longitude";
> :semi_major_axis = "92389234"; // new at CF 1.2
> :semi_minor_axis = "78682347"; // new at CF 1.2
>
> And I could read this file today using, say, ncdump and ncview. Which
> clearly doesn't tell us much. Yet a proposer of a given CF change
> cannot force the hands of software developers to produce compliant
> software within a particular time frame, if at all. In some (many?)
> circumstances I think we have to take it as an act of faith that a
> particular update to the CF convention will be advantageous. Plus I
> believe that the robustness of the CF peer review and challenge
> mechanisms is sufficient to ensure that those updates will be advantageous.
>
> Regards,
> Phil
> _______________________________________________
> CF-metadata mailing list
> [email protected]
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
--
--------------------------------------------------------------
Dr Jon Blower Tel: +44 118 378 5213 (direct line) Technical Director Tel: +44
118 378 8741 (ESSC) Reading e-Science Centre Fax: +44 118 378 6413 ESSC Email:
[EMAIL PROTECTED] University of Reading
3 Earley Gate
Reading RG6 6AL, UK
--------------------------------------------------------------
------------------------------
Message: 4
Date: Thu, 08 May 2008 09:23:10 -0600
From: John Caron <[EMAIL PROTECTED]>
Subject: Re: [CF-metadata] CF Conventions 1.2
Cc: [email protected]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Jonathan Gregory wrote:
> Dear Phil
>
>> Perhaps the recent
>> flurry of CF proposal activity in part reflects a general desire to
>> 'play catch-up'.
>
> Yes, I think that is the case. It certainly is the case for the two proposals
> I have made, on the axis and cell_methods attributes. These were discussed on
> the email list and in abeyance for a long time because we had no way to adopt
> them formally until we agreed the new rules.
>
>> I can produce some simple test files for the changes at CF 1.2. But
>> the question of what constitutes application conformance is, I
>> suggest, not easily defined. For instance, I could create a noddy
>> netcdf file with two new grid mapping attributes, as follows:
>
> Yes, I think such a file would be useful, because it does at least provide
> input data that the cf-checker can check for conformance, and other
> applications could likewise check that they can read in and interpret, if they
> are interested in these features. I agree with you that what "compliance"
> actually means for an application is ill-defined. This is an issue which has
> come up before, of course. Since most of CF is optional, in one sense (but not
> a very useful sense) an application is compliant even if it ignores all that
> optional metadata. On the other hand I am sure no application currently exists
> which interprets all the metadata. But I don't think that means the metadata
> is
> not useful. It can still be read by humans, it describes the data properly,
> and we only add features when people have a need for them (usually people who
> intend to produce data).
As a tool developer, a real netcdf file that has the new features(s) in it is
extremely useful. In
fact I dont even try to implement a feature until I have a real example of it.
So on a practical level, requiring an example netcdf file before the final
acceptance of a feature
seems to me to be reasonable. Proving that software "correctly conforms" is
difficult in the general
case.
We have a repository at Unidata of sample CF files, but they dont document
which features they use.
It would be very useful to start that documentation, and tie in back to CF
section numbers or anchors.
I propose we start a repository of sample files, ideally on the CF site,
documented as to what CF
features they use. It would be good if that documentation is a wiki (or
equivilent), so that the
initial person can make a start, then others can augment and comment on.
------------------------------
Message: 5
Date: Thu, 08 May 2008 10:07:23 -0600
From: Ethan Davis <[EMAIL PROTECTED]>
Subject: Re: [CF-metadata] CF Conventions 1.2
To: John Caron <[EMAIL PROTECTED]>
Cc: [email protected]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
John Caron wrote:
> We have a repository at Unidata of sample CF files, but they dont document
> which features they use.
> It would be very useful to start that documentation, and tie in back to CF
> section numbers or anchors.
>
> I propose we start a repository of sample files, ideally on the CF site,
> documented as to what CF
> features they use. It would be good if that documentation is a wiki (or
> equivilent), so that the
> initial person can make a start, then others can augment and comment on
I think it would be useful for each approved change to have a document
detailing the changes to be made to the CF spec. This document could be
referenced by the test data documents. As it stands, we have the trac
ticket discussions which can be very voluminous and even after approval
aren't always clear what exact changes to the specification are to be made.
Perhaps the closing comment to the trac ticket should be a detailed
change request. Though I think a separate (wiki?) document would be more
useful (both during the trac ticket discussion and afterwards for
documentation purposes).
Ethan
--
Ethan R. Davis Telephone: (303) 497-8155
Software Engineer Fax: (303) 497-8690
UCAR Unidata Program Center E-mail: [EMAIL PROTECTED]
P.O. Box 3000
Boulder, CO 80307-3000 http://www.unidata.ucar.edu/
---------------------------------------------------------------------------
------------------------------
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
End of CF-metadata Digest, Vol 62, Issue 2
******************************************
-------------------------------------------------------------------
-------------------------------------------------------------------
Forschungszentrum Jülich GmbH
52425 Jülich
Sitz der Gesellschaft: Jülich
Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
Vorsitzende des Aufsichtsrats: MinDir'in Bärbel Brumme-Bothe
Geschäftsführung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr. Harald Bolt,
Dr. Sebastian M. Schmidt
-------------------------------------------------------------------
-------------------------------------------------------------------
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata