Eizi TOYODA wrote:
Hi all,

Nobody here is trying to discourage the use of micro sign. What
character encoding are you going to use?

Micro sign (http://www.fileformat.info/info/unicode/char/00b5/index.htm)
is single byte 0xB5 in Latin-1 (aka ISO-8859-1) but becomes
double-byte 0xC2 0xB5 in UTF-8.  There is also confusing Greek small
letter mu (http://www.fileformat.info/info/unicode/char/03bc/index.htm)
which is 0xCE 0xBC.  In short this letter is bad for computer
processing if we don't have mechanism to specify character encoding.

UDUNITS 2 API has "encoding" argument, and users can choose either
ASCII, Latin-1, or UTF-8.  Accordingly "udunits2" command has options
-A -L and -U.  It is enough for library that users have control and
responsibility.  But CF is a standard of metadata that is exchanged
among people to avoid confusion.

The CF community can choose many ways.  I'd like to see views on the community:

(1) Create a global attribute to specify character encoding (like XML)
      I believe this won't work.
(2) Declare that CF uses UTF-8
      Probably many people simply ignore that and put single 0xB5 as micro sign.
(3) Recommends only US-ASCII letters in "units" attribute
      Very conservative, but that is consistent with allowing only
English in standardized attributes.
(4) Do nothing
      I have to warn programmers to anticipate any byte pattern above.
      That would work if only micro sign is an extension to ASCII.

Best Regards,
Strings stored in netCDF (eg variable names, attributes, String data in netCDF-4) are interpreted as UTF-8, and theres no standard way to indicate a different encoding. CF could add such a mechanism, but unless they do, by default "CF uses UTF-8". It would probably be worth speaking to this explicitly in the CF doc; I would advocate sticking with UTF-8. Requiring US-ASCII for attributes that CF defines is reasonable also.

As always, there's a tension between CF creating "best practices" for new file writers vs trying to define conformance (and staying backwards compatible). I think spelling out the unit should be best practice, but keeping a "backwards compatible" version of udunits-2 seems necessary.
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to