Jonathan, I believe that you place an unreasonable burden on general-purpose software readers of netcdf-3 files, which you expect to include AI-like code which completely "understands" all possible CF files, just so it can tell the difference between char variables meant to be interpreted as chars and char variables meant to be interpreted as strings (by collapsing the rightmost dimension). The supreme authorities, you and David Hassell (your own employee?!), couldn't even agree on whether H4 was a valid CF file. How can you then demand that software do better?
It is easy/trivial for software reading a netcdf-4 file (as defined in NUG) to distinguish char variables and String variables, why is it so wrong to ask for the same ease/clarity with netcdf-3 files? Part of my effort here was to start dealing with the massive rift between CF (which only covers netcdf-3 files) and NUG (which covers netcdf-3 and netcdf-4 files). Isn't that a reasonable goal? And even if you ignore the issue of distinguishing chars from strings, there is still no attribute in CF to specify the character set for char scalars and char arrays that are to be interpreted as chars. You can't say "_Encoding" because the default for _Encoding is "UTF-8", which is not a valid option for char scalars and char arrays because it may span multiple chars. The list of valid character sets for char scalars and char arrays (in netcdf-3 and netcdf-4 files) must be different from the list of valid _Encodings for strings. A different attribute, e.g., charset, is needed for chars (as opposed to strings) in netcdf-3 and netcdf-4 files. On Tue, Mar 7, 2017 at 9:03 AM, Jonathan Gregory <[email protected]> wrote: > Dear Chris > > > We need to be "clear" about what we mean by "the intent is clear". I > think > > that much of the point of CF is to be as explicit as possible, -- i.e. > the > > reader of a CF file should not have to know anything about how given data > > tends to be used in order to determine what data type an array should be > > (or what shape it should be). > > Yes, I agree with that. However, if you're reading a CF file, you aren't > just reading plain variables. If you're using/writing software which knows > how to interpret the file following the CF convention, it should know what > the "intent" is, in a CF context, of each of the variables of interest. > For example, you know that an auxiliary coordinate variable of char data > must > be a vector of strings, and the trailing or only dimension is the max > string > length. If you came across this variable when scanning all the variables in > a netCDF file, with no interest in CF, you wouldn't know that it was an > array > of strings, but if you are using it as a CF aux coord var, you do know > that, > so I don't think any further signal is needed - it would be redundant. > > Best wishes > > Jonathan > > ----- Forwarded message from Chris Barker <[email protected]> ----- > > > Date: Mon, 6 Mar 2017 11:16:35 -0800 > > From: Chris Barker <[email protected]> > > To: Jonathan Gregory <[email protected]> > > CC: "[email protected]" <[email protected]> > > Subject: Re: [CF-metadata] Pre-proposal for "charset" > > > > On Mon, Mar 6, 2017 at 9:47 AM, Jonathan Gregory < > [email protected]> > > wrote: > > > > > Yes, we can reopen the ticket. I think the _Encoding for char is a good > > > idea, > > > especially if it's an NUG convention. > > > > > > so let's do that part at least. > > > > > Are there any files out in the wild that DO use ND arrays of NC_CHAR > that > > > > are not intended to be interpreted as a (N-1)D array of Strings? > > > > > > That is the question. In particular, since this the CF convention we're > > > talking about, are there any char arrays which are part of CF, > > > > > > indeed. > > > > > > > where the > > > intent is not clear? > > > > > We need to be "clear" about what we mean by "the intent is clear". I > think > > that much of the point of CF is to be as explicit as possible, -- i.e. > the > > reader of a CF file should not have to know anything about how given data > > tends to be used in order to determine what data type an array should be > > (or what shape it should be). > > > > I saw this an an author of sometimes generic tools -- the tool should be > > able to read the file, and produce the appropriate native array for the > > task at hand, without knowing something like: "ahh, this is the ID of a > > Acme-ocean-widget -- those use char IDs -- so this must be a char" -- > > Humans can do that -- software can't (not easily anyway!) > > > > And clearly specifying whether a char array is a char array or a string > > array will better unify netcdf3 and netcdf4. > > > > netcdf4 can be explicit about it -- netcdf3 can't -- so it'd be nice if > CF > > could fill that gap. > > > > Now that I think about it, this really should be a netcdf convention -- > > like _FillValue, but this is a CF list.... > > > > -CHB > > > > -- > > > > Christopher Barker, Ph.D. > > Oceanographer > > > > Emergency Response Division > > NOAA/NOS/OR&R (206) 526-6959 voice > > 7600 Sand Point Way NE (206) 526-6329 fax > > Seattle, WA 98115 (206) 526-6317 main reception > > > > [email protected] > > ----- End forwarded message ----- > _______________________________________________ > CF-metadata mailing list > [email protected] > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata > -- Sincerely, Bob Simons IT Specialist Environmental Research Division NOAA Southwest Fisheries Science Center 99 Pacific St., Suite 255A (New!) Monterey, CA 93940 (New!) Phone: (831)333-9878 (New!) Fax: (831)648-8440 Email: [email protected] The contents of this message are mine personally and do not necessarily reflect any position of the Government or the National Oceanic and Atmospheric Administration. <>< <>< <>< <>< <>< <>< <>< <>< <><
_______________________________________________ CF-metadata mailing list [email protected] http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
