#159: charset attribute
-----------------------------+------------------------------
Reporter: bob.simons | Owner: cf-conventions@…
Type: enhancement | Status: new
Priority: medium | Milestone:
Component: cf-conventions | Version:
Resolution: | Keywords:
-----------------------------+------------------------------
Old description:
> In order to specify the character set of char and string variables,
> I propose that we append this paragraph to the end of CF section 2.2:
>
> Each char array variable that is to be interpreted
> as an array of individual characters (not string(s))
> must have a "charset" attribute which
> clarifies that the variable is to be interpreted as
> individual characters (not string(s)) and specifies
> the 8-bit character set used by the chars.
> Currently, the only values allowed for "charset"
> are "ISO-8859-1" and "ISO-8859-15".
> A scalar char variable may also use the "charset"
> attribute, which defaults to "ISO-8859-15" if
> it is not specified.
>
> A string or string array variable (including a char
> array variable that is to be interpreted as a string
> or array of strings) may have an "_Encoding" attribute.
> Alternatively, a file may have a global "_Encoding"
> attribute which applies to all strings (scalar and
> array) in the file. Currently, the only values
> allowed for "_Encoding" are "ISO-8859-1",
> "ISO-8859-15" and "UTF-8". A missing "_Encoding"
> attribute defaults to UTF-8.
>
> (This 2017-03-02 version is the consensus revised proposal from Chris
> Barker, Heiko Klein, and Bob Simons. This replaces the original proposed
> text.)
New description:
In order to specify the character set of char and string variables,
I propose that we append these two paragraphs to the end of CF section
2.2:
Each char array variable that is to be interpreted
as an array of individual characters (not string(s))
must have a "charset" attribute which
clarifies that the variable is to be interpreted as
individual characters (not string(s)) and specifies
the 8-bit character set used by the chars.
Values for "charset" are case-insensitive. See
http://www.iana.org/assignments/character-sets/character-sets.xhtml .
Currently, the only values allowed for "charset"
are "ISO-8859-1" and "ISO-8859-15".
A scalar char variable may also use the "charset"
attribute, which defaults to "ISO-8859-15" if
it is not specified.
A string or string array variable (including a char
array variable that is to be interpreted as a string
or array of strings) may have an "_Encoding" attribute.
Alternatively, a file may have a global "_Encoding"
attribute which applies to all strings (scalar and
array) in the file. Values for "_Encoding" are
case-insensitive. See
http://www.iana.org/assignments/character-sets/character-sets.xhtml .
Currently, the only values
allowed for "_Encoding" are "ISO-8859-1",
"ISO-8859-15" and "UTF-8". A missing "_Encoding"
attribute defaults to "UTF-8".
(This 2017-03-02b version is the consensus revised proposal from Chris
Barker, Heiko Klein, and Bob Simons, with further changes requested by
Jonathon Gregory.)
--
Comment (by bob.simons):
Well, this has more information than Heiko's version. The debate leads me
to write like a lawyer. ;-)
Yes, Heiko's version was one paragraph, but there are two attributes which
cover two situations and I think deserve two paragraphs for clarity.
I have brought back the "case-insensitive" sentence and the IANA link from
my original version.
If approved "charset" should be added to Appendex A, too.
--
Ticket URL: <http://cf-trac.llnl.gov/trac/ticket/159#comment:5>
CF Metadata <http://cf-convention.github.io/>
CF Metadata