One more note: from #141, someone wrote: "In the HDF5 case, string encoding is an intrinsic part of the HDF5 string datatype and can only be ASCII or UTF-8. "
So that's it -- "strings will be UTF-8" we're done with that part :-). Dealing with unicode with bare CHAR arrays is a bad idea -- sure you can do it, as a CHAR array can hold anything -- but we don't recommend using char arrays to hold, e.g. floating point data with an attribute saying that it's an IEEE 754 32 bit float -- we _could_ do that, but it would be a really bad idea. The one use case that makes at least a little bit of sense is to use CHAR arrays to pass encoded data around, if and only if the library doing the passing around is not expected to interpret the data as text in any way. This is why *nix systems have been able to get away with poorly specified encoding of filenames for so long. But do CF-data handling libraries ever do that? In short, does anyone need Unicode that can't use a string type? If modern Fortran netcdf libs can't handle strings, I can't imagine they do the right thing with Unicode anyway. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/cf-convention/cf-conventions/issues/139#issuecomment-433473555
