Hi all, I wasn't quite able to form this into a coherent paragraphs so here are some things to keep in mind re: UTF8 vs other encodings:
* UTF8 is backwards compatible with ASCII if the following are true: no byte order mark, all code points are between U+0000 and U+007F * UTF8 is not backwards comparable with Latin1 (ISO 8859-1) because code points above U+007F need two bytes to represent. * There are multiple ways of representing the same grapheme, the netCDF classic format required UTF8 to be in [Normalization Form Canonical Composition](https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization) (NFC) My personal recommendation is that the only encoding for text in CF netCDF be UTF8 in NFC with no byte order mark. For attributes where there is desire to restrict what is allowed (though controlled vocabulary or other limitations), the restriction should be specified using unicode points, e.g. "only printing characters between U+0000 and U+007F are allowed in controlled attributes". Text which is in controlled vocabulary attributes should continue to be char arrays. Freeform attributes (mostly those in 2.6.2. Description of file contents), could probably be either string or char arrays. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/cf-convention/cf-conventions/issues/141#issuecomment-407515269
