I think there is some confusion here. First, this whole regex stuff is only about the physical byte layout of the netcdf classic file format. I would in principle suggest to completely focus on netcdf4 files instead.
Second, I think CF should not concern itself with encodings and byte order stuff at all. Leave that to netcdf4/hdf5 and just work at the character level. And yes, unicode has code points, but also a concept of characters (see [here](https://en.wikipedia.org/wiki/Unicode#Architecture_and_terminology)). Third, looking at the regex in question ``` ([a-zA-Z0-9_]|{MUTF8})([^\x00-\x1F/\x7F-\xFF]|{MUTF8})* ``` notice that it is only an explanatory comment, but apart from that the overwhelmingly likely way to parse this, thanks to the "|" alternatives, is as either ``` ([a-zA-Z0-9_])([^\x00-\x1F/\x7F-\xFF])* ``` ie an ascii string starting with a character, digit, or underscore, limited to the first 128 bytes without control characters and excluding "/" everywhere or ``` ({MUTF8})({MUTF8})* ``` ie *any* unicode string encoded as normalized UTF-8. -- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/cf-convention/cf-conventions/issues/141#issuecomment-599957114 This list forwards relevant notifications from Github. It is distinct from [email protected], although if you do nothing, a subscription to the UCAR list will result in a subscription to this list. To unsubscribe from this list only, send a message to [email protected].
