#159: charset attribute
-----------------------------+------------------------------
  Reporter:  bob.simons      |      Owner:  cf-conventions@…
      Type:  enhancement     |     Status:  new
  Priority:  medium          |  Milestone:
 Component:  cf-conventions  |    Version:
Resolution:                  |   Keywords:
-----------------------------+------------------------------
Description changed by bob.simons:

Old description:

> In order to specify the character set of char and string variables,
> I propose that we append this paragraph to the end of CF section 2.2:
>
>   All char and string variables must include a charset attribute to
>   identify the character set (encoding) used by the variable. The
>   value of the attribute must be the "Preferred MIME Name" or "Name"
>   of one of the 8-bit encodings (so not UTF-16 or UTF-32, since CF
>   chars are 8-bits) listed at
>   http://www.iana.org/assignments/character-sets/character-sets.xhtml .
>   Charset names are case-insensitive.
>   The only recommended charset names are "ISO-8859-1" (which is
>   useful for European languages and for backwards compatibility
>   with 7-bit ASCII characters) and "UTF-8" (which is useful when
>   full Unicode is needed). (In older files with variables that
>   don't specify a charset, the character set being used remains
>   ambiguous.)

New description:

 In order to specify the character set of char and string variables,
 I propose that we append this paragraph to the end of CF section 2.2:

   Each char array variable that is to be interpreted
   as an array of individual characters (not string(s))
   must have a "charset" attribute which
   clarifies that the variable is to be interpreted as
   individual characters (not string(s)) and specifies
   the 8-bit character set used by the chars.
   Currently, the only values allowed for "charset"
   are "ISO-8859-1" and "ISO-8859-15".
   A scalar char variable may also use the "charset"
   attribute, which defaults to "ISO-8859-15" if
   it is not specified.

   A string or string array variable (including a char
   array variable that is to be interpreted as a string
   or array of strings) may have an "_Encoding" attribute.
   Alternatively, a file may have a global "_Encoding"
   attribute which applies to all strings (scalar and
   array) in the file. Currently, the only values
   allowed for "_Encoding" are "ISO-8859-1",
   "ISO-8859-15" and "UTF-8". A missing "_Encoding"
   attribute defaults to UTF-8.

 (This 2017-03-02 version is the consensus revised proposal from Chris
 Barker, Heiko Klein, and Bob Simons. This replaces the original proposed
 text.)

--

--
Ticket URL: <http://cf-trac.llnl.gov/trac/ticket/159#comment:3>
CF Metadata <http://cf-convention.github.io/>
CF Metadata

Reply via email to