[CF-metadata] Add support for attributes of type string

Jim Biard Mon, 23 Jul 2018 09:29:41 -0700

Hi.

Attributes with a type of string are now possible with netCDF-4, andmany examples of attributes with this type are "in the wild". As anexample of how this is happening, IDL creates an attribute with thistype if you select its version of **`string`** type instead of**`char`** type. It seems that people often assume that **`string`** isthe correct type to use because they wish to store strings, not characters.

I propose to add verbiage to the Conventions to allow attributes thathave a type of **`string`**. There are two ramifications to allowingattributes of this type, the second of which impacts string variables aswell.

1. A **`string`** attribute can contain 1D atomic string arrays. We needto decide whether or not we want to allow these or limit them (at leastfor now) to atomic string scalars. Attributes with arrays of stringscould allow for cleaner delimiting of multiple parts than spaces orcommas do now (e.g. flag_values and flag_meanings could both be arrays),but this would be a significant stretch for current software packages.2. A **`string`** attribute (and a **`string`** variable) can containUTF-8 Unicode strings. UTF-8 uses variable-length characters, with thestandard ASCII characters as the 1-byte subset. According to the Unicodestandard, a UTF-8 string can be signaled by the presence of a specialnon-printing three byte sequence known as a Byte Order Mark (BOM) at thefront of the string, although this is not required. IDL (again, forexample) writes this BOM sequence at the beginning of every attribute orvariable element of type **`string`**.

Allowing attributes containing arrays of strings may open up usefulfuture directions, but it will be more of a break from the past thanattributes that have only single strings. Allowing attributes (andvariables) to contain UTF-8 will free people to store non-Englishcontent, but it might pose headaches for software written in olderlanguages such as C and FORTRAN.

To finalize the change to support **`string`** type attributes, we needto decide:


1. Do we explicitly forbid string array attributes?

2. Do we place any restrictions on the content of **`string`**attributes and (by extension) variables?


Now that I have the background out of the way, here's my proposal.

Allow **`string`** attributes. Specify that the attributes defined bythe current CF Conventions must be scalar (contain only one string).

Allow UTF-8 in attribute and variable values. Specify that the currentCF Conventions use only ASCII characters (which are a subset of UTF-8)for all terms defined within. That is, the controlled vocabulary of CF(standard names and extensions, cell_methods terms other than free-textelements of comments(?), area type names, time units, etc) is composedentirely of ASCII characters. Free-text elements (comments, long names,flag_meanings, etc) may use any UTF-8 character.

Github issue: #141<https://github.com/cf-convention/cf-conventions/issues/141>

Trac ticket: #176 <https://cf-trac.llnl.gov/trac/ticket/176#ticket>

Grace and peace,

Jim

--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc>         *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: [email protected] <mailto:[email protected]>
o: +1 828 271 4900

/Connect with us on Facebook for climate<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics<https://www.facebook.com/NOAANCEIoceangeo> information, and follow uson Twitter at @NOAANCEIclimate <https://twitter.com/NOAANCEIclimate> and@NOAANCEIocngeo <https://twitter.com/NOAANCEIocngeo>. /

_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

[CF-metadata] Add support for attributes of type string

Reply via email to