On Wed, Feb 22, 2017 at 10:56 AM, Chris Barker <chris.bar...@noaa.gov> wrote:
> > Another note: > > On Mon, Feb 6, 2017 at 3:08 PM, Bob Simons - NOAA Federal < > bob.sim...@noaa.gov> wrote: > >> * "HTML" - the chars are to be interpreted as an array of Strings with >> HTML content, using the ISO-8859-1 charset. Non-ISO-8859-1 must be encoded >> using the &#d; format where d is the decimal number of a Unicode character. >> * "XML" - the chars are to be interpreted as a an array of Strings with >> XML content, using the ISO-8859-1 charset. Non-ISO-8859-1 characters must >> be encoded using the &#d; format where d is the decimal number of a Unicode >> character. >> > > Don't HTML and XML both use an ASCII-compatible header that specified the > encoding? > HTTP (which is used to transmit HTML and other documents) includes information in the header, notably the Content-type, e.g., Content-type: application/json; charset=utf-8 Yes XML documents have a "prolog", e.g., <?xml version="1.0" encoding="UTF-8"?> which uses the word "encoding". I'm proposing that we add something like that to CF so that the charset is known. > (and XML uses "encoding", rather than "charset"): > > <?xml version="1.0" encoding="UTF-8"?> > > and "the default character encoding was changed to UTF-8 in HTML5." > > So if there is going to be a default, it should probably be UTF-8 > I am not suggesting a default charset. For all the existing CF files, the charset is unknown and it would be dangerous to specify any specific charset to apply retroactively (other than that the lower 7bits are compatible with 7bit ASCII, which is true of 8859-1 and UTF-8 and many other charsets). I am suggesting that new files could be written and include a charset attribute to specify the charset in use. > > We need to either specify the "string" dimension, or have a consistent > convention: > > A 10x8 CHAR array could be either 10 8 character strings or 8 ten > character strings. And it gets more confusing with higher dimensions. > There is no standard naming system in CF to denote a String dimension (ie, the number of chars, vs a char array). That is a different approach to solving the problem. I don't like that approach as much because so many people have written so much software that writes and reads files using dimension names of their choice. I don't want to tell everyone to rewrite all their exiting files and software/scripts to read/write those files in order to comply with new CF rules. Instead, I'm proposing a separate, new attribute (data_type=string|char), partly because it doesn't interfere with existing dimension names or attribute names. > > -CHB > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > -- Sincerely, Bob Simons IT Specialist Environmental Research Division NOAA Southwest Fisheries Science Center 99 Pacific St., Suite 255A (New!) Monterey, CA 93940 (New!) Phone: (831)333-9878 (New!) Fax: (831)648-8440 Email: bob.sim...@noaa.gov The contents of this message are mine personally and do not necessarily reflect any position of the Government or the National Oceanic and Atmospheric Administration. <>< <>< <>< <>< <>< <>< <>< <>< <><
_______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata