Dear Bob and all,

I have not had time to follow this thread in detail, but a remark (in the most recent email) that seemed unnecessarily sarcastic caught my eye, and impelled me to look into what might have led to this dip in our normally courteous, respectful discourse. As Chair of the CF Governance Panel, I feel obliged to remind everyone that the success (and fun!) of our endeavor depends on enthusiastic engagement, which is only discouraged if what ought to be earnest debate and substantive argument is disrupted (however briefly and unintentionally) by remarks that could be interpreted as being even slightly derogatory. There are no "supreme authorities" here (although some are much more knowledgeable than others). We progress by consensus, and only respectful contributions to the discussion can be tolerated.

Enough said about that.

Addressing only the part of the "pre-proposal" that suggests there is a need to explicitly distinguish strings from characters (not the part of the proposal that deals with the flavor of the 7 or 8 bit representation of characters):

1) Note that the H.4 example being discussed was slightly modified (I think on 29th February 2016), and now includes "station_name" in the list of coordinates, thus explicitly linking it to the humidity and temp variables. This along with the fact that station_name is *not* included as a dimension for these variables allows you to infer that this is a *single* station described by a character string of length 23, and not 23 stations with single character i.d.'s.

2) If the coordinate dimension is *required* in this case (and currently it may not be), then software should be able to unambiguously interpret things. [This requirement was suggestion 2 (of 3), made my Jonathan in one of his earlier comments.]

best regards,
Karl



On 3/7/17 11:08 AM, Bob Simons - NOAA Federal wrote:
Jonathan, I believe that you place an unreasonable burden on general-purpose software readers of netcdf-3 files, which you expect to include AI-like code which completely "understands" all possible CF files, just so it can tell the difference between char variables meant to be interpreted as chars and char variables meant to be interpreted as strings (by collapsing the rightmost dimension). The supreme authorities, you and David Hassell (your own employee?!), couldn't even agree on whether H4 was a valid CF file. How can you then demand that software do better?

It is easy/trivial for software reading a netcdf-4 file (as defined in NUG) to distinguish char variables and String variables, why is it so wrong to ask for the same ease/clarity with netcdf-3 files? Part of my effort here was to start dealing with the massive rift between CF (which only covers netcdf-3 files) and NUG (which covers netcdf-3 and netcdf-4 files). Isn't that a reasonable goal?

And even if you ignore the issue of distinguishing chars from strings, there is still no attribute in CF to specify the character set for char scalars and char arrays that are to be interpreted as chars. You can't say "_Encoding" because the default for _Encoding is "UTF-8", which is not a valid option for char scalars and char arrays because it may span multiple chars. The list of valid character sets for char scalars and char arrays (in netcdf-3 and netcdf-4 files) must be different from the list of valid _Encodings for strings. A different attribute, e.g., charset, is needed for chars (as opposed to strings) in netcdf-3 and netcdf-4 files.



On Tue, Mar 7, 2017 at 9:03 AM, Jonathan Gregory <[email protected] <mailto:[email protected]>> wrote:

    Dear Chris

    > We need to be "clear" about what we mean by "the intent is
    clear". I think
    > that much of the point of CF is to be as explicit as possible,
    -- i.e. the
    > reader of a CF file should not have to know anything about how
    given data
    > tends to be used in order to determine what data type an array
    should be
    > (or what shape it should be).

    Yes, I agree with that. However, if you're reading a CF file, you
    aren't
    just reading plain variables. If you're using/writing software
    which knows
    how to interpret the file following the CF convention, it should
    know what
    the "intent" is, in a CF context, of each of the variables of
    interest.
    For example, you know that an auxiliary coordinate variable of
    char data must
    be a vector of strings, and the trailing or only dimension is the
    max string
    length. If you came across this variable when scanning all the
    variables in
    a netCDF file, with no interest in CF, you wouldn't know that it
    was an array
    of strings, but if you are using it as a CF aux coord var, you do
    know that,
    so I don't think any further signal is needed - it would be redundant.

    Best wishes

    Jonathan

    ----- Forwarded message from Chris Barker <[email protected]
    <mailto:[email protected]>> -----

    > Date: Mon, 6 Mar 2017 11:16:35 -0800
    > From: Chris Barker <[email protected]
    <mailto:[email protected]>>
    > To: Jonathan Gregory <[email protected]
    <mailto:[email protected]>>
    > CC: "[email protected] <mailto:[email protected]>"
    <[email protected] <mailto:[email protected]>>
    > Subject: Re: [CF-metadata] Pre-proposal for "charset"
    >
    > On Mon, Mar 6, 2017 at 9:47 AM, Jonathan Gregory
    <[email protected] <mailto:[email protected]>>
    > wrote:
    >
    > > Yes, we can reopen the ticket. I think the _Encoding for char
    is a good
    > > idea,
    > > especially if it's an NUG convention.
    >
    >
    > so let's do that part at least.
    >
    > > Are there any files out in the wild that DO use ND arrays of
    NC_CHAR that
    > > > are not intended to be interpreted as a (N-1)D array of Strings?
    > >
    > > That is the question. In particular, since this the CF
    convention we're
    > > talking about, are there any char arrays which are part of CF,
    >
    >
    > indeed.
    >
    >
    > > where the
    > > intent is not clear?
    > >
    > We need to be "clear" about what we mean by "the intent is
    clear". I think
    > that much of the point of CF is to be as explicit as possible,
    -- i.e. the
    > reader of a CF file should not have to know anything about how
    given data
    > tends to be used in order to determine what data type an array
    should be
    > (or what shape it should be).
    >
    > I saw this an an author of sometimes generic tools -- the tool
    should be
    > able to read the file, and produce the appropriate native array
    for the
    > task at hand, without knowing something like: "ahh, this is the
    ID of a
    > Acme-ocean-widget -- those use char IDs -- so this must be a
    char" --
    > Humans can do that -- software can't (not easily anyway!)
    >
    > And clearly specifying whether a char array is a char array or a
    string
    > array will better unify netcdf3 and netcdf4.
    >
    > netcdf4 can be explicit about it -- netcdf3 can't -- so it'd be
    nice if CF
    > could fill that gap.
    >
    > Now that I think about it, this really should be a netcdf
    convention --
    > like _FillValue, but this is a CF list....
    >
    > -CHB
    >
    > --
    >
    > Christopher Barker, Ph.D.
    > Oceanographer
    >
    > Emergency Response Division
    > NOAA/NOS/OR&R (206) 526-6959 <tel:%28206%29%20526-6959>   voice
    > 7600 Sand Point Way NE (206) 526-6329
    <tel:%28206%29%20526-6329>   fax
> Seattle, WA 98115 (206) 526-6317 <tel:%28206%29%20526-6317> main reception
    >
    > [email protected] <mailto:[email protected]>

    ----- End forwarded message -----
    _______________________________________________
    CF-metadata mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
    <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>




--
Sincerely,

Bob Simons
IT Specialist
Environmental Research Division
NOAA Southwest Fisheries Science Center
99 Pacific St., Suite 255A      (New!)
Monterey, CA 93940               (New!)
Phone: (831)333-9878            (New!)
Fax:   (831)648-8440
Email: [email protected] <mailto:[email protected]>

The contents of this message are mine personally and
do not necessarily reflect any position of the
Government or the National Oceanic and Atmospheric Administration.
<>< <>< <>< <>< <>< <>< <>< <>< <><



_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to