On Wed, Feb 22, 2017 at 12:08 PM, Bob Simons - NOAA Federal < [email protected]> wrote:
> I do like ISO-8859-1, because > * It is compatible with ASCII for chars 0-127, which is all that ASCII > specifies. > * Any variable that has just 7bit ASCII chars can be labelled > "charset=ISO-8859-1". > * It is the most commonly used single-page 8bit charset for supporting the > European languages. > * It is widely used and supported. > all good. And I don't know if this is only the Python implementation, but at least in Python, 8859-1 can read ANY binary data, and it round-trips through a "proper" unicode object to get teh saem bytes back. i.e. if the data are not 8859-1 or are malformed for some reason, the 8859-1 decoder will not error out on any input, and if you re-encode it, you'll get back the same bytes you started with. Really nice property. I do like UTF-8 because it is the only charset that supports full Unicode > (all UTF-16/UCS-4/UTF-32 characters) in an 8bit encoding (since that is all > we have for characters in netcdf-3 files: 8bit chars). > Again, I think this is a non-issue -- UTF-32 uses 4 bytes per char, i.e. 4 chars per codepoint. no reason you couldn't put UTF-32 encoded data in a char array (C programmer do it all the time :-) ) > And it is incredibly widely used and supported in software. All the rest of your reasons are good -- UTF-8 is the best choice. So my proposal is: charset can specify any single-page (8bit) character > set, but the two recommended charsets would be "ISO-8859-1" (for most > simple cases) and "UTF-8" (for harder cases / full Unicode). > sounds good. though part of me wants to say that "ISO-8859-1" and "UTF-8" should be the only options! (darn those legacy files!) Also -- I don't think yu can call UTF-8 an 8bit character set. I'd also like the work "encoding" to be used instead of character set wherever possible. "charset" comes from, and still implies, a 1-byte per character system. But that that's really a nitpick. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [email protected]
_______________________________________________ CF-metadata mailing list [email protected] http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
