#159: charset attribute
-----------------------------+------------------------------
Reporter: bob.simons | Owner: cf-conventions@…
Type: enhancement | Status: new
Priority: medium | Milestone:
Component: cf-conventions | Version:
Resolution: | Keywords:
-----------------------------+------------------------------
Description changed by bob.simons:
Old description:
> In order to specify the character set of char and string variables,
> I propose that we append this paragraph to the end of CF section 2.2:
>
> All char and string variables must include a charset attribute to
> identify the character set (encoding) used by the variable. The
> value of the attribute must be the "Preferred MIME Name" or "Name"
> of one of the 8-bit encodings (so not UTF-16 or UTF-32, since CF
> chars are 8-bits) listed at
> http://www.iana.org/assignments/character-sets/character-sets.xhtml .
> Charset names are case-insensitive.
> The only recommended charset names are "ISO-8859-1" (which is
> useful for European languages and for backwards compatibility
> with 7-bit ASCII characters) and "UTF-8" (which is useful when
> full Unicode is needed). (In older files with variables that
> don't specify a charset, the character set being used remains
> ambiguous.)
New description:
In order to specify the character set of char and string variables,
I propose that we append this paragraph to the end of CF section 2.2:
Each char array variable that is to be interpreted
as an array of individual characters (not string(s))
must have a "charset" attribute which
clarifies that the variable is to be interpreted as
individual characters (not string(s)) and specifies
the 8-bit character set used by the chars.
Currently, the only values allowed for "charset"
are "ISO-8859-1" and "ISO-8859-15".
A scalar char variable may also use the "charset"
attribute, which defaults to "ISO-8859-15" if
it is not specified.
A string or string array variable (including a char
array variable that is to be interpreted as a string
or array of strings) may have an "_Encoding" attribute.
Alternatively, a file may have a global "_Encoding"
attribute which applies to all strings (scalar and
array) in the file. Currently, the only values
allowed for "_Encoding" are "ISO-8859-1",
"ISO-8859-15" and "UTF-8". A missing "_Encoding"
attribute defaults to UTF-8.
(This 2017-03-02 version is the consensus revised proposal from Chris
Barker, Heiko Klein, and Bob Simons. This replaces the original proposed
text.)
--
--
Ticket URL: <http://cf-trac.llnl.gov/trac/ticket/159#comment:3>
CF Metadata <http://cf-convention.github.io/>
CF Metadata