Dear Bob and all,
I have not had time to follow this thread in detail, but a remark (in
the most recent email) that seemed unnecessarily sarcastic caught my
eye, and impelled me to look into what might have led to this dip in our
normally courteous, respectful discourse. As Chair of the CF Governance
Panel, I feel obliged to remind everyone that the success (and fun!) of
our endeavor depends on enthusiastic engagement, which is only
discouraged if what ought to be earnest debate and substantive argument
is disrupted (however briefly and unintentionally) by remarks that could
be interpreted as being even slightly derogatory. There are no "supreme
authorities" here (although some are much more knowledgeable than
others). We progress by consensus, and only respectful contributions to
the discussion can be tolerated.
Enough said about that.
Addressing only the part of the "pre-proposal" that suggests there is a
need to explicitly distinguish strings from characters (not the part of
the proposal that deals with the flavor of the 7 or 8 bit representation
of characters):
1) Note that the H.4 example being discussed was slightly modified (I
think on 29th February 2016), and now includes "station_name" in the
list of coordinates, thus explicitly linking it to the humidity and temp
variables. This along with the fact that station_name is *not* included
as a dimension for these variables allows you to infer that this is a
*single* station described by a character string of length 23, and not
23 stations with single character i.d.'s.
2) If the coordinate dimension is *required* in this case (and
currently it may not be), then software should be able to unambiguously
interpret things. [This requirement was suggestion 2 (of 3), made my
Jonathan in one of his earlier comments.]
best regards,
Karl
On 3/7/17 11:08 AM, Bob Simons - NOAA Federal wrote:
Jonathan, I believe that you place an unreasonable burden on
general-purpose software readers of netcdf-3 files, which you expect
to include AI-like code which completely "understands" all possible CF
files, just so it can tell the difference between char variables meant
to be interpreted as chars and char variables meant to be interpreted
as strings (by collapsing the rightmost dimension). The supreme
authorities, you and David Hassell (your own employee?!), couldn't
even agree on whether H4 was a valid CF file. How can you then demand
that software do better?
It is easy/trivial for software reading a netcdf-4 file (as defined in
NUG) to distinguish char variables and String variables, why is it so
wrong to ask for the same ease/clarity with netcdf-3 files?
Part of my effort here was to start dealing with the massive rift
between CF (which only covers netcdf-3 files) and NUG (which covers
netcdf-3 and netcdf-4 files). Isn't that a reasonable goal?
And even if you ignore the issue of distinguishing chars from strings,
there is still no attribute in CF to specify the character set for
char scalars and char arrays that are to be interpreted as chars.
You can't say "_Encoding" because the default for _Encoding is
"UTF-8", which is not a valid option for char scalars and char arrays
because it may span multiple chars. The list of valid character sets
for char scalars and char arrays (in netcdf-3 and netcdf-4 files) must
be different from the list of valid _Encodings for strings. A
different attribute, e.g., charset, is needed for chars (as opposed to
strings) in netcdf-3 and netcdf-4 files.
On Tue, Mar 7, 2017 at 9:03 AM, Jonathan Gregory
<[email protected] <mailto:[email protected]>> wrote:
Dear Chris
> We need to be "clear" about what we mean by "the intent is
clear". I think
> that much of the point of CF is to be as explicit as possible,
-- i.e. the
> reader of a CF file should not have to know anything about how
given data
> tends to be used in order to determine what data type an array
should be
> (or what shape it should be).
Yes, I agree with that. However, if you're reading a CF file, you
aren't
just reading plain variables. If you're using/writing software
which knows
how to interpret the file following the CF convention, it should
know what
the "intent" is, in a CF context, of each of the variables of
interest.
For example, you know that an auxiliary coordinate variable of
char data must
be a vector of strings, and the trailing or only dimension is the
max string
length. If you came across this variable when scanning all the
variables in
a netCDF file, with no interest in CF, you wouldn't know that it
was an array
of strings, but if you are using it as a CF aux coord var, you do
know that,
so I don't think any further signal is needed - it would be redundant.
Best wishes
Jonathan
----- Forwarded message from Chris Barker <[email protected]
<mailto:[email protected]>> -----
> Date: Mon, 6 Mar 2017 11:16:35 -0800
> From: Chris Barker <[email protected]
<mailto:[email protected]>>
> To: Jonathan Gregory <[email protected]
<mailto:[email protected]>>
> CC: "[email protected] <mailto:[email protected]>"
<[email protected] <mailto:[email protected]>>
> Subject: Re: [CF-metadata] Pre-proposal for "charset"
>
> On Mon, Mar 6, 2017 at 9:47 AM, Jonathan Gregory
<[email protected] <mailto:[email protected]>>
> wrote:
>
> > Yes, we can reopen the ticket. I think the _Encoding for char
is a good
> > idea,
> > especially if it's an NUG convention.
>
>
> so let's do that part at least.
>
> > Are there any files out in the wild that DO use ND arrays of
NC_CHAR that
> > > are not intended to be interpreted as a (N-1)D array of Strings?
> >
> > That is the question. In particular, since this the CF
convention we're
> > talking about, are there any char arrays which are part of CF,
>
>
> indeed.
>
>
> > where the
> > intent is not clear?
> >
> We need to be "clear" about what we mean by "the intent is
clear". I think
> that much of the point of CF is to be as explicit as possible,
-- i.e. the
> reader of a CF file should not have to know anything about how
given data
> tends to be used in order to determine what data type an array
should be
> (or what shape it should be).
>
> I saw this an an author of sometimes generic tools -- the tool
should be
> able to read the file, and produce the appropriate native array
for the
> task at hand, without knowing something like: "ahh, this is the
ID of a
> Acme-ocean-widget -- those use char IDs -- so this must be a
char" --
> Humans can do that -- software can't (not easily anyway!)
>
> And clearly specifying whether a char array is a char array or a
string
> array will better unify netcdf3 and netcdf4.
>
> netcdf4 can be explicit about it -- netcdf3 can't -- so it'd be
nice if CF
> could fill that gap.
>
> Now that I think about it, this really should be a netcdf
convention --
> like _FillValue, but this is a CF list....
>
> -CHB
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R (206) 526-6959 <tel:%28206%29%20526-6959> voice
> 7600 Sand Point Way NE (206) 526-6329
<tel:%28206%29%20526-6329> fax
> Seattle, WA 98115 (206) 526-6317 <tel:%28206%29%20526-6317>
main reception
>
> [email protected] <mailto:[email protected]>
----- End forwarded message -----
_______________________________________________
CF-metadata mailing list
[email protected] <mailto:[email protected]>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
<http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
--
Sincerely,
Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
99 Pacific St., Suite 255A (New!)
Monterey, CA 93940 (New!)
Phone: (831)333-9878 (New!)
Fax: (831)648-8440
Email: [email protected] <mailto:[email protected]>
The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <><
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata