Hi all, I wasn't quite able to form this into a coherent paragraphs so here are 
some things to keep in mind re: UTF8 vs other encodings:

* UTF8 is backwards compatible with ASCII if the following are true: no byte 
order mark, all code points are between U+0000 and U+007F
* UTF8 is not backwards comparable with Latin1 (ISO 8859-1) because code points 
above U+007F need two bytes to represent.
* There are multiple ways of representing the same grapheme, the netCDF classic 
format required UTF8 to be in [Normalization Form Canonical 
Composition](https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization) 
(NFC)

My personal recommendation is that the only encoding for text in CF netCDF be 
UTF8 in NFC with no byte order mark. For attributes where there is desire to 
restrict what is allowed (though controlled vocabulary or other limitations), 
the restriction should be specified using unicode points, e.g. "only printing 
characters between U+0000 and U+007F are allowed in controlled attributes".

Text which is in controlled vocabulary attributes should continue to be char 
arrays. Freeform attributes (mostly those in 2.6.2. Description of file 
contents), could probably be either string or char arrays.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/cf-convention/cf-conventions/issues/141#issuecomment-407515269

Reply via email to