@hrajagers brings up encoding -- always a challenge! Any reason we can't say that ALL "things" of the staring type are utf-8? period, end of story. I'd also love to say that all CHAR data is ASCII (or maybe latin-1) -- if you need unicode, use a string.
The odds are very good that if you are dealing with software that can only handle char, it isn't going to handle Unicode well anyway. **Reasoning:** For "over the wire" encoding, utf-8 encoding is the best choice, and has become a defacto standard - (and an actual standard for , say JSON for example). And lots of people think "utf-8" == "Unicode" -- they are wrong, but if we always use utf-8, then people and tools that handle unicode properly will work well, and tools and people that don't will still mostly work. See: http://utf8everywhere.org/ for a strong opinion. Personally, I think they are wrong about "in memory", but their arguments do apply to "on disk" or "over the wire" -- essentially any interchange situation. As for ASCII for CHAR -- the char type (at least in arrays) has to be fixed length -- utf-8 is not a fixed-length encoding -- that is, 10 "characters" may require 10 or more bytes to store. And it a string is truncated naively, it could result in an invalid string. Since netcdf provides a variable length string that's the obvious way to deal with Unicode. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/cf-convention/cf-conventions/issues/139#issuecomment-407196650
