#159: charset attribute
-----------------------------+------------------------------
  Reporter:  bob.simons      |      Owner:  cf-conventions@…
      Type:  enhancement     |     Status:  new
  Priority:  medium          |  Milestone:
 Component:  cf-conventions  |    Version:
Resolution:                  |   Keywords:
-----------------------------+------------------------------

Old description:

> In order to specify the character set of char and string variables,
> I propose that we append this paragraph to the end of CF section 2.2:
>
>   Each char array variable that is to be interpreted
>   as an array of individual characters (not string(s))
>   must have a "charset" attribute which
>   clarifies that the variable is to be interpreted as
>   individual characters (not string(s)) and specifies
>   the 8-bit character set used by the chars.
>   Currently, the only values allowed for "charset"
>   are "ISO-8859-1" and "ISO-8859-15".
>   A scalar char variable may also use the "charset"
>   attribute, which defaults to "ISO-8859-15" if
>   it is not specified.
>
>   A string or string array variable (including a char
>   array variable that is to be interpreted as a string
>   or array of strings) may have an "_Encoding" attribute.
>   Alternatively, a file may have a global "_Encoding"
>   attribute which applies to all strings (scalar and
>   array) in the file. Currently, the only values
>   allowed for "_Encoding" are "ISO-8859-1",
>   "ISO-8859-15" and "UTF-8". A missing "_Encoding"
>   attribute defaults to UTF-8.
>
> (This 2017-03-02 version is the consensus revised proposal from Chris
> Barker, Heiko Klein, and Bob Simons. This replaces the original proposed
> text.)

New description:

 In order to specify the character set of char and string variables,
 I propose that we append these two paragraphs to the end of CF section
 2.2:

   Each char array variable that is to be interpreted
   as an array of individual characters (not string(s))
   must have a "charset" attribute which
   clarifies that the variable is to be interpreted as
   individual characters (not string(s)) and specifies
   the 8-bit character set used by the chars.
   Values for "charset" are case-insensitive. See
   http://www.iana.org/assignments/character-sets/character-sets.xhtml .
   Currently, the only values allowed for "charset"
   are "ISO-8859-1" and "ISO-8859-15".
   A scalar char variable may also use the "charset"
   attribute, which defaults to "ISO-8859-15" if
   it is not specified.

   A string or string array variable (including a char
   array variable that is to be interpreted as a string
   or array of strings) may have an "_Encoding" attribute.
   Alternatively, a file may have a global "_Encoding"
   attribute which applies to all strings (scalar and
   array) in the file. Values for "_Encoding" are
   case-insensitive. See
   http://www.iana.org/assignments/character-sets/character-sets.xhtml .
   Currently, the only values
   allowed for "_Encoding" are "ISO-8859-1",
   "ISO-8859-15" and "UTF-8". A missing "_Encoding"
   attribute defaults to "UTF-8".

 (This 2017-03-02b version is the consensus revised proposal from Chris
 Barker, Heiko Klein, and Bob Simons, with further changes requested by
 Jonathon Gregory.)

--

Comment (by bob.simons):

 Well, this has more information than Heiko's version. The debate leads me
 to write like a lawyer. ;-)

 Yes, Heiko's version was one paragraph, but there are two attributes which
 cover two situations and I think deserve two paragraphs for clarity.

 I have brought back the "case-insensitive" sentence and the IANA link from
 my original version.

 If approved "charset" should be added to Appendex A, too.

--
Ticket URL: <http://cf-trac.llnl.gov/trac/ticket/159#comment:5>
CF Metadata <http://cf-convention.github.io/>
CF Metadata

Reply via email to