Re: [CF-metadata] Pre-proposal for "charset"

Jonathan Gregory Wed, 22 Feb 2017 10:22:24 -0800

Dear Bob

Thnaks for the helpful example. Yes, I agree that on the basis of netCDF alone
you can't tell whether it's a string or a 10-char array. However, the cf_role
is a string-valued attribute, according to the CF convention, so it must be a
string. I expect that for contents of netCDF files that follow the CF con-
vention this ambiguity shouldn't arise - but if there are cases where it does
we should consider them.


Best wishes

Jonathan

----- Forwarded message from Bob Simons - NOAA Federal <[email protected]> 
-----

> Date: Tue, 21 Feb 2017 15:31:11 -0800
> From: Bob Simons - NOAA Federal <[email protected]>
> To: CF Metadata <[email protected]>
> Subject: Re: [CF-metadata] Pre-proposal for "charset"
> 
> You requested a sample file which demonstrates the need for a "data_type"
> attribute for char variables to distinguish Strings from true chars.,,
> 
> Here is a file that I was just given which is a good example.
> It is a valid CF DSG file.
> The cf_role=timeseries_id variable appears as char[10].
> So just looking at that variable: is it one string (with 10 characters), or
> an array with 10 values (each a char)?
> Yes, a human can think about the whole file and come to a conclusion of
> which it "must" be, but that is very, very hard or impossible for a
> computer program to figure out (I could be wrong).
> 
> But if the timeseries variable had the proposed attribute
>   :data_type="string"
> it would be trivial for the software to know that this variable should be
> interpreted as one string (not 10 separate chars).
> 
> I hope that was what you were looking for. If not, please tell me why not
> and I'll find another example,
> 
> netcdf summary_allTB2007.nc {
>   dimensions:
>     timeseries = 10;
>     time = 996;
>   variables:
>     char timeseries(timeseries=10);
>       :cf_role = "timeseries_id";
>       :long_name = "timeseries";
> 
>     double time(time=996);
>       :units = "seconds since 1970-01-01T00:00:00Z";
>       :standard_name = "time";
>       :long_name = "time";
>       :calendar = "gregorian";
>       :axis = "T";
> 
>     double latitude;
>       :valid_min = -90.0; // double
>       :valid_max = 90.0; // double
>       :axis = "Y";
>       :long_name = "latitude";
>       :standard_name = "latitude";
>       :units = "degrees_north";
> 
>     double longitude;
>       :valid_min = -180.0; // double
>       :valid_max = 180.0; // double
>       :axis = "X";
>       :long_name = "longitude";
>       :standard_name = "longitude";
>       :units = "degrees_east";
> 
>     double depth;
>       :positive = "down";
>       :axis = "Z";
>       :valid_min = 0.0; // double
>       :valid_max = 10971.0; // double
>       :long_name = "depth";
>       :standard_name = "depth";
>       :units = "m";
> 
>     char platform;
>       :long_name = "MVCO ASIT";
> 
>     char instrument;
>       :long_name = "Imaging FlowCytobot";
> 
>     double crs;
>       :grid_mapping_name = "latitude_longitude";
>       :longitude_of_prime_meridian = 0.0; // double
>       :semi_major_axis = 6378137.0; // double
>       :inverse_flattening = 298.257223563; // double
>       :epsg_code = "EPSG:4326";
> 
>     double Asterionellopsis(time=996);
>       :_FillValue = -9999.9; // double
>       :long_name = "Asterionellopsis";
>       :standard_name = "Asterionellopsis";
>       :units = "1";
>       :coordinates = "time depth latitude longitude";
>       :grid_mapping = "crs";
>       :platform = "platform";
>       :instrument = "instrument";
> 
>     double Cerataulina(time=996);
>       :_FillValue = -9999.9; // double
>       :long_name = "Cerataulina";
>       :standard_name = "Cerataulina";
>       :units = "1";
>       :coordinates = "time depth latitude longitude";
>       :grid_mapping = "crs";
>       :platform = "platform";
>       :instrument = "instrument";
> 
>     double Ceratium(time=996);
>       :_FillValue = -9999.9; // double
>       :long_name = "Ceratium";
>       :standard_name = "Ceratium";
>       :units = "1";
>       :coordinates = "time depth latitude longitude";
>       :grid_mapping = "crs";
>       :platform = "platform";
>       :instrument = "instrument";
> 
>     double Chaetoceros(time=996);
>       :_FillValue = -9999.9; // double
>       :long_name = "Chaetoceros";
>       :standard_name = "Chaetoceros";
>       :units = "1";
>       :coordinates = "time depth latitude longitude";
>       :grid_mapping = "crs";
>       :platform = "platform";
>       :instrument = "instrument";
> 
>     double Corethron(time=996);
>       :_FillValue = -9999.9; // double
>       :long_name = "Corethron";
>       :standard_name = "Corethron";
>       :units = "1";
>       :coordinates = "time depth latitude longitude";
>       :grid_mapping = "crs";
>       :platform = "platform";
>       :instrument = "instrument";
> 
>     double Coscinodiscus(time=996);
>       :_FillValue = -9999.9; // double
>       :long_name = "Coscinodiscus";
>       :standard_name = "Coscinodiscus";
>       :units = "1";
>       :coordinates = "time depth latitude longitude";
>       :grid_mapping = "crs";
>       :platform = "platform";
>       :instrument = "instrument";
> 
>   // global attributes:
>   :featureType = "timeSeries";
>   :Conventions = "CF-1.6";
>   :institution = "Obfuscated";
>   :title = "Obfuscated";
>  data:
> }
> 
> 
> > Date: Fri, 17 Feb 2017 17:46:45 +0000
> > From: Jonathan Gregory <[email protected]>
> > To: [email protected]
> > Subject: Re: [CF-metadata] Pre-proposal for "charset"
> > Message-ID: <[email protected]>
> > Content-Type: text/plain; charset=us-ascii
> >
> > Dear Bob
> >
> > I agree that sometimes char data is characters and sometimes strings, and
> > one
> > can't tell which it is without knowing the intended use of the array
> > concerned.
> > When you do know the role of this array e.g. as a quality flag data
> > variable,
> > or a string-valued auxiliary coordinary variable, then you know also
> > whether
> > it's a string or an array of characters. Can you give an example where one
> > needs to know how a char array should be interpreted but you *don't* know
> > what
> > its purpose is within the CF-netCDF file?
> >
> > Best wishes
> >
> > Jonathan
> >
> > ----- Forwarded message from Bob Simons - NOAA Federal <
> > [email protected]> -----
> >
> > > Date: Wed, 8 Feb 2017 10:00:32 -0800
> > > From: Bob Simons - NOAA Federal <[email protected]>
> > > To: CF Metadata <[email protected]>
> > > Subject: Re: [CF-metadata] Pre-proposal for "charset"
> > >
> > > I think my original pre-proposal has a significant flaw and needs to be
> > > revised.
> > > The problem is: charset needs to be specifiable for all char arrays,
> > > regardless of whether the values should be interpreted as Strings or
> > > individual chars.
> > >
> > > I see two basic solutions:
> > >
> > > 1) Two attributes, but a given variable would only use one of them. The
> > > first part of the attribute name specifies the data type:
> > >   char_charset = "ISO-8859-1";   //identifies a char variable using
> > > ISO-8859-1
> > > or
> > >   string_charset = "ISO-8859-1";   //identifies a String variable using
> > > ISO-8859-1
> > >
> > > 2) Two attributes that would both be specified for every char/String
> > > variable, e.g.,
> > >   charset = "ISO-8859-1";
> > >   data_type = "String";             //or "char"
> > >
> > > In either case, the charsets allowed for char (not String) data must be
> > > restricted to single code page (e.g, "ISO-8859-1") because other
> > encodings
> > > (e.g., "UTF-8") need multiple bytes for some characters..
> > >
> > > ---
> > > I have a slight preference (2), because it is cleaner and might be better
> > > in the future (I don't know the implications for nc4 and CF2).
> > >
> > > Thoughts? Votes?
> > >
> > >
> > >
> > >
> > > On Mon, Feb 6, 2017 at 3:08 PM, Bob Simons - NOAA Federal <
> > > [email protected]> wrote:
> > >
> > > > Before I make a formal CF proposal for a "charset" attribute, I would
> > like
> > > > to get comments and suggestions from all of you.
> > > >
> > > > This is a proposal to solve the problem of distinguishing strings from
> > > > arrays of characters and the problem of identifying the string's
> > character
> > > > encoding. Presumably, it would be appended to section 2.2.
> > > >
> > > > An example of actual need is: Many/most current uses of
> > multidimensional
> > > > char arrays are intended to be interpreted as Strings. But some files,
> > > > e.g., Argo profile float profiles, have single char data that are
> > stored in
> > > > char arrays.
> > > >
> > > > Another example, while most nc files just use 7-bit ASCII characters in
> > > > strings, some use 8-bit characters. Some such files appear to use
> > > > charset=Windows-1252, others use Mac OS Roman, others use ISO-8859-1,
> > but
> > > > the the charset is not specified and there is currently no official CF
> > way
> > > > to specify it.
> > > >
> > > > Another advantage of this proposal is that it provides a way to support
> > > > Unicode (and thus all of the world's languages) via the UTF-8 encoding
> > > > which is useful as we increasingly work with people from non-US,
> > > > non-European countries.
> > > >
> > > > A possible extension of this is to allow a few special additional
> > > > pseudo-charset names:
> > > > * "HTML" - the chars are to be interpreted as an array of Strings with
> > > > HTML content, using the ISO-8859-1 charset. Non-ISO-8859-1  must be
> > encoded
> > > > using the &#d; format where d is the decimal number of a Unicode
> > character.
> > > > * "XML" -  the chars are to be interpreted as a an array of Strings
> > with
> > > > XML content, using the ISO-8859-1 charset. Non-ISO-8859-1 characters
> > must
> > > > be encoded using the &#d; format where d is the decimal number of a
> > Unicode
> > > > character.
> > > >
> > > > Thank you for considering this.
> > > >
> > > >
> > > > --- The Actual Pre-Proposal
> > > > Use the "charset" attribute to indicate that a multidimensional
> > > > char array should be interpreted as an array of Strings,
> > > > not an array of individual characters.
> > > > The value of "charset" also serves to specify the character set
> > > > used to encode the strings
> > > > and must be the name of one of the 8-bit encodings
> > > > (since CF chars are 8-bits) listed at
> > > > http://www.iana.org/assignments/character-sets/character-sets.xhtml .
> > > > Charset names are case-insensitive.
> > > > The only charsets which are recommended are "ISO-8859-1" and "UTF-8".
> > > > For backwards compatibility, if "charset" is not defined,
> > > > it remains ambiguous whether a char array should be interpreted as
> > > > holding an array of individual characters or an array of Strings.
> > > >
> > > >
> > > > --- An Example: Encoding three Strings: "It", "Book", and "5 &euro;".
> > > > The Unicode code point for the Euro symbol is 20AC (in hexadecimal),
> > > > which is 8364 (in decimal).
> > > > The Euro symbol is encoded in UTF-8 as 3 bytes: E2 82 AC (in
> > hexadecimal).
> > > > So a file would store these strings in a char array as:
> > > >   dimensions
> > > >     words = 3;
> > > >     strLen = 5;
> > > >   char myWords[words][strLen] = "It[0][0][0]", "Book[0]", "5
> > [E2][82][AC]";
> > > >     charset = "UTF-8";
> > > >
> > > >
> > > > --
> > > > Sincerely,
> > > >
> > > > Bob Simons
> > > > IT Specialist
> > > > Environmental Research Division
> > > > NOAA Southwest Fisheries Science Center
> > > > 99 Pacific St., Suite 255A      (New!)
> > > > Monterey, CA 93940               (New!)
> > > > Phone: (831)333-9878 <(831)%20333-9878>            (New!)
> > > > Fax:   (831)648-8440 <(831)%20648-8440>
> > > > Email: [email protected]
> > > >
> > > > The contents of this message are mine personally and
> > > > do not necessarily reflect any position of the
> > > > Government or the National Oceanic and Atmospheric Administration.
> > > > <>< <>< <>< <>< <>< <>< <>< <>< <><
> > > >
> > > >
> > >
> > >
> > > --
> > > Sincerely,
> > >
> > > Bob Simons
> > > IT Specialist
> > > Environmental Research Division
> > > NOAA Southwest Fisheries Science Center
> > > 99 Pacific St., Suite 255A      (New!)
> > > Monterey, CA 93940               (New!)
> > > Phone: (831)333-9878            (New!)
> > > Fax:   (831)648-8440
> > > Email: [email protected]
> > >
> > > The contents of this message are mine personally and
> > > do not necessarily reflect any position of the
> > > Government or the National Oceanic and Atmospheric Administration.
> > > <>< <>< <>< <>< <>< <>< <>< <>< <><
> >
> > > _______________________________________________
> > > CF-metadata mailing list
> > > [email protected]
> > > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> >
> >
> > ----- End forwarded message -----
> >
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Fri, 17 Feb 2017 13:22:29 -0600
> > From: David Blodgett <[email protected]>
> > To: CF Metadata <[email protected]>
> > Subject: Re: [CF-metadata] Extension of Discrete Sampling Geometries
> >         for Simple Features
> > Message-ID: <[email protected]>
> > Content-Type: text/plain; charset="utf-8"
> >
> > All,
> >
> > I haven?t heard much follow up, but here?s a doodle to coordinate a phone
> > conversation about this. I think we have west-coast US participants and EU
> > participants, so I chose times mid to late morning for me (midwest US).
> >
> > http://doodle.com/poll/eikarnt35tdm7igd <http://doodle.com/poll/
> > eikarnt35tdm7igd>
> >
> > Will make a call once a few people have expressed interest and we have a
> > clear day/time.
> >
> > Regards,
> >
> > - Dave
> >
> > > On Feb 6, 2017, at 11:29 AM, David Blodgett <[email protected]> wrote:
> > >
> > > Dear CF,
> > >
> > > I want to follow up on the conversation here with an alternative
> > approach suggested off list primarily between Jonathan and I. For this, I?m
> > going to focus on use cases satisfied and simplification of the proposal
> > allowed by not supporting those use cases. The changes below are largely
> > driven by a desire to better align this proposal with the technical details
> > of the prior art that is CF.
> > >
> > > If we:
> > > 1) don?t support node sharing, we can remove the complication of node -
> > coordinate indexing / indirection, simplifying the proposal pretty
> > significantly.
> > > 2) don?t use ?break values? to indicate the separation between
> > multi-part geometries and polygon holes, we end up with a data model with
> > an extra dimension, but the NetCDF dimensions align with the natural
> > dimensions of the data.
> > > 3) use ?count? instead of a ?start pointer? approach, we are better
> > aligned with the existing DSG contiguous ragged array approach.
> > >
> > > Coming back to the three directions we could take this proposal from my
> > cover letter on February 2nd.
> > >> Direct use of Well-Known Text (WKT). In this approach, well known text
> > strings would be encoded using character arrays following a contiguous
> > ragged array approach to index the character array by geometry (or instance
> > in DSG parlance).
> > >> Implement the WKT approach using a NetCDF binary array. In this
> > approach, well known text separators (brackets, commas and spaces) for
> > multipoint, multiline, multipolygon, and polygon holes, would be encoded as
> > break type separator values like -1 for multiparts and -2 for holes.
> > >> Implement the fundamental dimensions of geometry data in NetCDF. In
> > this approach, additional dimensions and variables along those dimensions
> > would be introduced to represent geometries, geometry parts, geometry
> > nodes, and unique (potentially shared) coordinate locations for nodes to
> > reference.
> > > The alternative I?m outlining here moves in the direction of 3. We had
> > originally discounted it because it becomes very verbose and seems overly
> > complicated if support for coordinate sharing is a requirement. If the
> > three simplifications described above are used, then the third approach
> > seems more tenable.
> > >
> > > Jonathan has also suggested that: (these are in reaction to the CDL in
> > my letter from February 2nd)
> > > 1) Rename geom_coordinates as node_coordinates, for consistency with
> > UGRID.
> > > 2) Omit node_dimension. This is redundant, since the dimension can be
> > found by
> > > examining the node coordinate variables.
> > > 3) Prescribe numerous ?codes? and assumptions in the specification
> > instead of letting them be described with attribute values.
> > > 4) It would be more consistent with CF and UGRID to use a single
> > container variable to hang all the topology/geometry information from.
> > >
> > > Which I, personally, am happy to accept if others don?t object.
> > >
> > > A couple other suggestions from Jonathan I want to discuss a bit more:
> > > 1) Rename geometry as topology and geom_type as topology_type.
> > >       While I?d be open to something other than geom, topology is odd.
> > If this is really ?node_collection_topology_type? I guess I could be
> > convinced, but would be curious how people react to this. (Especially in
> > relation to UGRID)
> > > 2) This extension is more appropriate as an extension to the concept of
> > cell bounds than the addition of a complex time-invariate type of discrete
> > sampling geometry.
> > >       Having just re-read the cell bounds chapter, I think it would over
> > complicate the cell bounds to include this material. My basic issue here is
> > that these geometries do not necessarily have a reference location. They
> > are, rather, first order entities that need to be treated as such. That
> > said, it makes sense that these geometries are not necessarily a good fit
> > for the original intent of Discrete Sampling Geometries. Jonathan suggested
> > they may belong in their own chapter, which may be a good alternative? MY
> > suggested CDL below might lead us in the direction of this being a special
> > type of auxiliary coordinate variable.
> > >
> > > This alternative starts to look like the CDL pasted below.
> > >
> > > Note that the issue of coordinates is sticking out like a sore thumb.
> > Below, I?ve attempted to reconcile Jonathan?s ideas regarding coordinates
> > with my thoughts about how these geometries are ?first order entities? that
> > don?t have a single representative x and y. The spatial coordinates can be
> > said to reside in the system of geometries described in the ?sf? container
> > variable? I realize this goes against the idea of coordinates a bit, but I
> > think it is holding with the spirit of the attribute?
> > >
> > > Finally, I?m glad to continue answering questions and debating things
> > via the list to a point, but I think it would be in our interest to arrange
> > a telecom to discuss this stuff further with a list of interested parties.
> > Feel free to follow up on list, but for decision making, let?s not let this
> > rabbit hole go too deep. I?ll plan on letting this and the other recent
> > action on this proposal settle with people for a week or two then start to
> > bring together a conference call (or calls depending on time zones). Please
> > respond to me off list if you are interested in being part of a call to
> > discuss.
> > >
> > > Regards,
> > >
> > > - Dave
> > >
> > > netcdf multipolygon_example {
> > > dimensions:
> > >  node = 47 ;
> > >  part = 9 ;
> > >  instance = 3 ;
> > >  time = 5 ;
> > >  strlen = 5 ;
> > > variables:
> > >  char instance_name(instance, strlen) ;
> > >    instance_name:cf_role = "timeseries_id" ;
> > >  double someVariable(instance) ;
> > >    someVariable:long_name = "a variable describing a single-valued
> > attribute of a polygon" ;
> > >    someVariable:coordinates = "sf" ; // or "instance_name"?
> > >  int time(time) ;
> > >    time:units = "days since 2000-01-01" ;
> > >  double someData(instance, time) ;
> > >    someData:coordinates = "time sf" ; // or "time instance_name"?
> > >    someData:featureType = "timeSeries" ;
> > >    someData:geometry="sf";
> > >  int sf; // containing variable -- datatype irrelevant because no data
> > >    sf:geom_type = "multipolygon" ; // could be node_topology_type?
> > >    sf:node_count_variable="node_count";
> > >    sf:node_coordinates = "x y" ;
> > >    sf:part_count = "part_node_count" ;
> > >    sf:part_type = "part_type" ; // Note required unless polygons with
> > holes present.
> > >    sf:outer_ring_order = "anticlockwise" ; // not required if written in
> > spec?
> > >    sf:closure_convention = "last_node_equals_first" ; // not required if
> > written in spec?
> > >    sf:outer_type_code = 0 ; // not required if written in spec?
> > >    sf:inner_type_code = 1 ; // not required if written in spec?
> > >  int node_count(instance);
> > >    node_count:long_name = ?count of coordinates in each instance
> > geometry" ;
> > >  int part_node_count(part) ;
> > >    part_node_count:long_name = ?count of coordinates in each geometry
> > part" ;
> > >  int part_type(part) ;
> > >    part_type:long_name = ?type of each geometry part" ;
> > >  double x(node) ;
> > >    x:units = "degrees_east" ;
> > >    x:standard_name = "longitude" ; // or projection_x_coordinate
> > >    X:cf_role = "geometry_x_node" ;
> > >  double y(node) ;
> > >    y:units = "degrees_north" ;
> > >    y:standard_name = ?latitude? ; // or projection_y_coordinate
> > >    y:cf_role = "geometry_y_node"
> > > // global attributes:
> > >     :Conventions = "CF-1.8" ;
> > >
> > > data:
> > >
> > >  instance_name =
> > >   "flash",
> > >   "bang",
> > >   "pow" ;
> > >
> > >  someVariable = 1, 2, 3 ;
> > >
> > >  time = 1, 2, 3, 4, 5 ;
> > >
> > >  someData =
> > >   1, 2, 3, 4, 5,
> > >   1, 2, 3, 4, 5,
> > >   1, 2, 3, 4, 5 ;
> > >
> > >  node_count = 25, 15, 7 ;
> > >
> > >  part_node_count = 5, 4, 4, 4, 4, 8, 6, 8, 4 ;
> > >
> > >  part_type = 0, 1, 1, 1, 0, 0, 0, 1, 0 ;
> > >
> > >  x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
> > >     5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45, -20,
> > -30, -20, -20, -30, 30,
> > >     45, 10, 30, 25, 50, 30, 25 ;
> > >
> > >  y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25,
> > 25, 29,
> > >     25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20, -35,
> > -20, -15, -25, -20, 20,
> > >     40, 40, 20, 5, 10, 15, 5 ;
> > > }
> > >
> > >
> > >
> > >> On Feb 4, 2017, at 8:07 AM, David Blodgett <[email protected] <mailto:
> > [email protected]>> wrote:
> > >>
> > >> Dear Chris,
> > >>
> > >> Thanks for your thorough treatment of these issues. We have gone
> > through a similar thought process to arrive at the proposal we came up
> > with. I?ll answer as briefly as I can.
> > >>
> > >> 1) how would you translate between netcdf geometries and, say geo JSON?
> > >>
> > >> The thinking is that node coordinate sharing is optional. If the writer
> > wants to check or already knows that nodes share coordinates, then it?s
> > possible. Otherwise, it doesn?t have to be used. I?ve always felt that this
> > was important, but maybe not critical for a core NetCDF-CF data model. Some
> > offline conversation has led to an example that does not use it that may be
> > a good alternative, more on that later.
> > >>
> > >> 2) Break Values
> > >>
> > >> You really do have to hold your nose on the break values. The issue is
> > that you have to store that information somehow and it is almost worse to
> > create new variables to store the multi-part and hole/not hole information.
> > The alternative approach that?s forming up as mentioned above does break
> > the information out into additional variables but simplifies things
> > otherwise. In that case it doesn?t feel overly complex to me? so stay tuned
> > for more on this front.
> > >>
> > >> 3) Ragged Indexing
> > >>
> > >> Your thought process follows ours exactly. The key is that you either
> > have to create the ?pointer? array as a first order of business or loop
> > over the counts ad nauseam. I?m actually leaning toward the counts for two
> > reasons. First, the counts approach is already in CF so is a natural fit
> > and will be familiar to developers in this space. Second, the issue of 0 vs
> > 1 indexing is annoying. In our proposal, we settled on 0 indexing because
> > it aligns with the idea of an offset, but it is still annoying and some
> > applications would always have to adjust that pointer array as a first
> > order of business.
> > >>
> > >> On to Bob?s comments.
> > >>
> > >> Regarding aligning with other data models / encodings, I guess this
> > needs to be unpacked a bit.
> > >>
> > >> 1) In this setting, simple features is a data model, not an encoding.
> > An encoding can implement part or all of a data model as is needed by the
> > use case(s) at hand. There is no problem with partial implementations you
> > still get interoperability for the intended use cases.
> > >> 2) Attempting to align with other encoding standards UGRID and
> > NetCDF-CF are the primary ones here, is simply to keep the implementation
> > patterns similar and familiar. This may be a fools errand, but is
> > presumably good for adoptability and consistency.
> > >> So, I don?t see a problem with implementing important simple features
> > types in a way that aligns with the way the existing community standards
> > work.
> > >>
> > >> I don?t see this as ignoring existing standards at all. There is no
> > open community standard for binary encoding of geometries and related data
> > that passes the CF requirements of human readability and self-description.
> > We are adopting the appropriate data model and suggesting a new encoding
> > that will solve a lot of problems in the environmental modeling space.
> > >>
> > >> As we?ve discussed before, your "different approach? sounds great, but
> > seems like an exercise for a future effort that doesn?t attempt to align
> > with CF 1.7. Maybe what you suggest is a path forward for variable length
> > arrays in the CF 2.0 ?vision in the mist?, but I don?t see it as a tenable
> > solution for CF 1.*.
> > >>
> > >> Best Regards,
> > >>
> > >> - Dave
> > >>
> > >>
> > >>> On Feb 3, 2017, at 3:31 PM, Chris Barker <[email protected]
> > <mailto:[email protected]>> wrote:
> > >>>
> > >>> a few thoughts. First, I think there are three core "issues" that need
> > to be resolved:
> > >>>
> > >>> 1) Coordinate indexing (indirection)
> > >>>
> > >>> the question of whether you have an array of "vertices" that the
> > geomotry types index into to get thier data:
> > >>>
> > >>> Advantages:
> > >>>  - if a number of geometries share a lot of vertices, it can be more
> > efficient
> > >>>  - the relationship between geometries that share vertices (i.e.
> > polygons that share a boundary) etc. is well defined. you dopnt need to
> > check for closeness, and maybe have a tolerance, etc.
> > >>>
> > >>> These were absolutely critical for UGRID for example -- a UGRID mesh
> > is a single thing", NOT a collection of polygons that happen to share some
> > vertices.
> > >>>
> > >>> Disadvantages:
> > >>>  -  if the geometries do not share many vertices, it is less efficient.
> > >>>  -  there are additional code complications in "getting" the vertices
> > of the given geometry
> > >>>  - it does not match the OGC data model.
> > >>>
> > >>> My 0.02 -- given my use cases, I tend to want teh advantages -- but I
> > don't know that that's a typical use case. And I think it's a really good
> > idea to keep with the OGS data model where possible -- i.e. e able to
> > translate from netcdf to, say, geoJSON as losslessly as possible. Given
> > that I think it's probably a better idea not to have the indirection.
> > >>>
> > >>> However (to equivocate) perhaps the types of information people are
> > likely to want to store in netcdf are a subset of what the OGC standards
> > are designed for -- and for those use-cases, maybe shared vertices are
> > critical.
> > >>>
> > >>> One way to think about it -- how would you translate between netcdf
> > geometries and, say geo JSON:
> > >>>   - nc => geojson would lose the shared index info.
> > >>>   - geojson => nc -- would you try to reconstruct the shared
> > vertices?? I"m thinking that would be a bit dangerous in the general case,
> > because you are adding information that you don't know is true -- are these
> > a shared vertex or two that just happen to be at the same location?
> > >>>
> > >>> > > Break values
> > >>>
> > >>> I don't really like break values as an approach, but with netcdf any
> > option will be ugly one way or another. So keeping with the WKT approach
> > makes sense to me. Either way you'll need custom code to unpack it. (BTW --
> > what does WellKnownBinary do?)
> > >>>
> > >>> > > Ragged indexing
> > >>>
> > >>> There are two "natural" ways to represent a ragged array:
> > >>>
> > >>> (a) store the length of each "row"
> > >>> (b) store the index to the beginning (or end) or each "row"
> > >>>
> > >>> CF already uses (a). However, working with it, I'm pretty convinced
> > that it's the "wrong" choice:
> > >>>
> > >>> If you want to know how long a given row is, that is really easy with
> > (a), and almost as easy with (b) (involves two indexes and a subtraction)
> > >>>
> > >>> However, if you want to extract a particular row: (b) makes this
> > really easy -- you simply access the slice of the array you want. with (a)
> > you need to loop through the entire "length_of_rows" array (up to the row
> > of interest) and add up the values to find the slice you need. not a huge
> > issue, but it is an issue. In fact, in my code to read ragged arrays in
> > netcdf, the first thing I do is pre-compute the index-to-each-row, so I can
> > then use that to access individual rows for future access -- if  you are
> > accessing via OpenDAP -- that's particular helpful.
> > >>>
> > >>> So -- (b) is clearly (to me) the "best" way to do it -- but is it
> > worth introducing a second way to handle ragged arrays in CF? I would think
> > yes, but that would be offset if:
> > >>>
> > >>>  - There is a bunch of existing library code that transparently
> > handles ragged arrays in netcdf (does netcdfJava have something? I'm pretty
> > sure Python doesn't -- certainly not in netCDF4)
> > >>>
> > >>>  - That that existing lib code would be advantageous to leverage for
> > code reading features: I suspect that there will have to be enough custom
> > code that the ragged array bits are going to be the least of it.
> > >>>
> > >>> So I'm for the "new" way of representing ragged arrays
> > >>>
> > >>> -CHB
> > >>>
> > >>>
> > >>> On Fri, Feb 3, 2017 at 11:41 AM, Bob Simons - NOAA Federal <
> > [email protected] <mailto:[email protected]>> wrote:
> > >>> Then, isn't this proposal just the first step in the creation of a new
> > model and a new encoding of Simple Features, one that is "align[ed] ...
> > with as many other encoding standards in this space as is practical"? In
> > other words, yet another standard for Simple Features?
> > >>>
> > >>> If so, it seems risky to me to take just the first (easy?) step "to
> > support the use cases that have a compelling need today" and not solve the
> > entire problem. I know the CF way is to just solve real, current needs, but
> > in this case it seems to risk a head slap moment in the future when we
> > realize that, in order to deal with some new simple feature variant, we
> > should have done things differently from the beginning?
> > >>>
> > >>> And it seems odd to reject existing standards that have been so
> > painstakingly hammered out, in favor of starting the process all over
> > again.  We follow existing standards for other things (e.g., IEEE-754 for
> > representing floating point numbers in binary files), why can't we follow
> > an existing Simple Features standard?
> > >>>
> > >>> ---
> > >>> Rather than just be a naysayer, let me suggest a very different
> > alternative:
> > >>>
> > >>> There are several projects in the CF realm (e.g., this Simple Features
> > project, Discrete Sampling Geometry (DSG), true variable-length Strings,
> > ugrid(?)) which share a common underlying problem: how to deal with
> > variable-length multidimensional arrays: a[b][c], where the length of the c
> > dimension may be different for different b indices.
> > >>> DSG solved this (5 different ways!), but only for DSG.
> > >>> The Simple Features proposal seeks to solve the problem for Simple
> > Features.
> > >>> We still have no support for Unicode variable-length Strings.
> > >>>
> > >>> Instead of continuing to solve the variable-length problem a different
> > way every time we confront it, shouldn't we solve it once, with one small
> > addition to the standard, and then use that solution repeatedly?
> > >>> The solution could be a simple variant of one of the DSG solutions,
> > but generalized so that it could be used in different situations.
> > >>> An encoding standard and built-in support for variable-length data
> > arrays in netcdf-java/c would solve a lot of problems, now and in the
> > future.
> > >>> Some work on this is already done: I think the netcdf-java API already
> > supports variable-length arrays when reading netcdf-4 files.
> > >>> For Simple Features, the problem would reduce to: store the feature
> > (using some specified existing standard like WKT or WKB) in a
> > variable-length array.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Feb 3, 2017 at 9:07 AM, <[email protected]
> > <mailto:[email protected]>> wrote:
> > >>> Date: Fri, 3 Feb 2017 11:07:00 -0600
> > >>> From: David Blodgett <[email protected] <mailto:[email protected]>>
> > >>> To: Bob Simons - NOAA Federal <[email protected] <mailto:
> > [email protected]>>
> > >>> Cc: CF Metadata <[email protected] <mailto:
> > [email protected]>>
> > >>> Subject: Re: [CF-metadata] Extension of Discrete Sampling Geometries
> > >>>         for Simple Features
> > >>> Message-ID: <[email protected] <mailto:
> > [email protected]>>
> > >>> Content-Type: text/plain; charset="utf-8"
> > >>>
> > >>> Dear Bob,
> > >>>
> > >>> I?ll just take these in line.
> > >>>
> > >>> 1) noted. We have been trying to figure out what to do with the point
> > featureType and I think leaving it more or less alone is a viable path
> > forward.
> > >>>
> > >>> 2) This is not an exact replica of WKT, but rather a similar approach
> > to WKT. As I stated, we have followed the ISO simple features data model
> > and well known text feature types in concept, but have not used the same
> > standardization formalisms. We aren?t advocating for supporting ?all of?
> > any standard but are rather attempting to support the use cases that have a
> > compelling need today while aligning this with as many other encoding
> > standards in this space as is practical. Hopefully that answers your
> > question, sorry if it?s vague.
> > >>>
> > >>> 3) The google doc linked in my response contains the encoding we are
> > proposing as a starting point for conversation: http://goo.gl/Kq9ASq <
> > http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>> I
> > want to stress, as a starting point for discussion. I expect that this
> > proposal will change drastically before we?re done.
> > >>>
> > >>> 4) Absolutely envision tools doing what you say, convert to/from
> > standard spatial formats and NetCDF-CF geometries. We intend to introduce
> > an R and a Python implementation that does exactly as you say along with
> > whatever form this standard takes in the end. R and Python were chosen as
> > the team that brought this together are familiar with those two languages,
> > additional implementations would be more than welcome.
> > >>>
> > >>> 5) We do include a ?geometry? featureType similar to the ?point?
> > featureType. Thus our difficulty with what to do with the ?point?
> > featureType. You are correct, there are lots of non timeSeries applications
> > to be solved and this proposal does intend to support them (within the
> > existing DSG constructs).
> > >>>
> > >>> Thanks for your questions, hopefully my answers close some gaps for
> > you.
> > >>>
> > >>> - Dave
> > >>>
> > >>> > On Feb 3, 2017, at 10:47 AM, Bob Simons - NOAA Federal <
> > [email protected] <mailto:[email protected]>> wrote:
> > >>> >
> > >>> > 1) There is a vague comment in the proposal about possibly changing
> > the point featureType. Please don't, unless the changes don't affect
> > current uses of Point. There are already 1000's of files that use it. If
> > this new system offers an alternative, then fine, it's an alternative. One
> > of the most important and useful features of a good standard is backwards
> > compatibility.
> > >>> >
> > >>> > 2) You advocate "Implement the WKT approach using a NetCDF binary
> > array." Is this system then an exact encoding of WKT, neither a subset nor
> > a superset?  "Simple Features" are often not simple.
> > >>> > If it is WKT (or something else), what is the standard you are
> > following to describe the Simple Features (e.g.,  ISO/IEC 13249-3:2016 and
> > ISO 19162:2015)?
> > >>> > Does your proposal deviate in any way from the standard's
> > capabilities?
> > >>> > Do you advocate following the entire WKT standard, e.g., supporting
> > all the feature types that WKT supports?
> > >>> >
> > >>> > 3) Since you are not using the WKT encoding, but creating your own,
> > where is the definition of the encoding system you are using?
> > >>> >
> > >>> > 4) This is a little out of CF scope, but:
> > >>> > Do you envision tools, notably, netcdf-c/java, having a writer
> > function that takes in WKT and encodes the information in a file, and
> > having a reader function that reads the file and returns WKT? Or is it your
> > plan that the encoding/ decoding is left to the user?
> > >>> >
> > >>> > 5) This proposal is for "Simple Features plus Time Series" (my
> > phrase not yours). But aren't there lots of other uses of Simple Features?
> > Will there be other proposals in the future for "Simple Features plus X"
> > and "Simple Features plus Y"? If so, will CF eventually become a massive
> > document where Simple Features are defined over and over again, but in
> > different contexts? If so, wouldn't a better solution be to deal with
> > Simple Features separately (as Postgres does by making a geometric data
> > type?), and then add "Simple Features plus Time Series" as the first use of
> > it?
> > >>> >
> > >>> > Thanks for answering these questions.
> > >>> > Please forgive me if I missed parts of your proposal that answer
> > these questions.
> > >>> >
> > >>> >
> > >>> > On Thu, Feb 2, 2017 at 5:57 AM, <[email protected]
> > <mailto:[email protected]> <mailto:cf-metadata-request@
> > cgd.ucar.edu <mailto:[email protected]>>> wrote:
> > >>> > Date: Thu, 2 Feb 2017 07:57:36 -0600
> > >>> > From: David Blodgett <[email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>>
> > >>> > To: <[email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>>
> > >>> > Subject: [CF-metadata] Extension of Discrete Sampling Geometries for
> > >>> >         Simple  Features
> > >>> > Message-ID: <[email protected] <mailto:
> > [email protected]> <mailto:224C2828-7212-449F-
> > [email protected] <mailto:224C2828-7212-449F-
> > [email protected]>>>
> > >>> > Content-Type: text/plain; charset="utf-8"
> > >>> >
> > >>> > Dear CF Community,
> > >>> >
> > >>> > We are pleased to submit this proposal for your consideration and
> > review. The cover letter we've prepared below provides some background and
> > explanation for the proposed approach. The google doc here <
> > http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq <
> > http://goo.gl/Kq9ASq>>> is an excerpt of the CF specification with track
> > changes turned on. Permissions for the document allow any google user to
> > comment, so feel free to comment and ask questions in line.
> > >>> >
> > >>> > Note that I?m sharing this with you with one issue unresolved. What
> > to do with the point featureType? Our draft suggests that it is part of a
> > new geometry featureType, but it could be that we leave it alone and
> > introduce a geometry featureType. This may be a minor point of discussion,
> > but we need to be clear that this is an issue that still needs to be
> > resolved in the proposal.
> > >>> >
> > >>> > Thank you for your time and consideration.
> > >>> >
> > >>> > Best Regards,
> > >>> >
> > >>> > David Blodgett, Tim Whiteaker, and Ben Koziol
> > >>> >
> > >>> > Proposed Extension to NetCDF-CF for Simple Geometries
> > >>> >
> > >>> > Preface
> > >>> >
> > >>> > The proposed addition to NetCDF-CF introduced below is inspired by a
> > pre-existing data model governed by OGC and ISO as ISO 19125-1. More
> > information on Simple Features may be found here. <
> > https://en.wikipedia.org/wiki/Simple_Features <https://en.wikipedia.org/
> > wiki/Simple_Features> <https://en.wikipedia.org/wiki/Simple_Features <
> > https://en.wikipedia.org/wiki/Simple_Features>>> To the knowledge of the
> > authors, it is consistent with ISO 19125-1 but has not been specified using
> > the formalisms of OGC or ISO. Language used attempts to hold true to
> > NetCDF-CF semantics while not conflicting with the existing standards
> > baseline. While this proposal does not support the entire scope of the the
> > simple features ecosystem, it does support the core data types in most
> > common use around the community.
> > >>> >
> > >>> > The other existing standard to mention is UGRID convention <
> > http://ugrid-conventions.github.io/ugrid-conventions/ <
> > http://ugrid-conventions.github.io/ugrid-conventions/> <
> > http://ugrid-conventions.github.io/ugrid-conventions/ <
> > http://ugrid-conventions.github.io/ugrid-conventions/>>>. The authors
> > have experience reading and writing UGRID and have designed the proposed
> > structure in a way that is inspired by and consistent with it.
> > >>> >
> > >>> > Terms and Definitions
> > >>> >
> > >>> > (Taken from OGC 06-103r4 OpenGIS Implementation Specification for
> > Geographic information - Simple feature access - Part 1: Common
> > architecture <http://www.opengeospatial.org/standards/sfa <
> > http://www.opengeospatial.org/standards/sfa> <http://www.opengeospatial.
> > org/standards/sfa <http://www.opengeospatial.org/standards/sfa>>>.)
> > >>> >
> > >>> > Feature: Abstraction of real world phenomena - typically a
> > geospatial abstraction with associated descriptive attributes.
> > >>> > Simple Feature: A feature with all geometric attributes described
> > piecewise by straight line or planar interpolation between point sets.
> > >>> > Geometry (geometric complex): A set of disjoint geometric primitives
> > - one or more points, lines, or polygons that form the spatial
> > representation of a feature.
> > >>> > Introduction
> > >>> >
> > >>> > Discrete Sampling Geometries (DSGs) handle data from one (or a
> > collection of) timeSeries (point), Trajectory, Profile, TrajectoryProfile
> > or timeSeriesProfile geometries. Measurements are from a point (timeSeries
> > and Profile) or points along a trajectory. In this proposal, we reuse the
> > core DSG timeSeries type which provides support for basic time series use
> > cases e.g., a timeSerieswhich is measured (or modeled) at a given point.
> > >>> >
> > >>> > Changes to Existing CF Specification
> > >>> >
> > >>> > In NetCDF-CF 1.7, Discrete Sampling Geometries separate dimensions
> > and variables into two types ? instance and element <
> > http://cfconventions.org/cf-conventions/cf-conventions.
> > html#_collections_instances_and_elements <http://cfconventions.org/cf-
> > conventions/cf-conventions.html#_collections_instances_and_elements> <
> > http://cfconventions.org/cf-conventions/cf-conventions.
> > html#_collections_instances_and_elements <http://cfconventions.org/cf-
> > conventions/cf-conventions.html#_collections_instances_and_elements>>>.
> > Instance refers to individual points, trajectories, profiles, etc. These
> > would sometimes be referred to as features given that they are identified
> > entities that can have associated attributes and be related to other
> > entities. Element dimensions describe temporal or other dimensions to
> > describe data on a per-instance basis. This proposal extends the DSG
> > timeSeries featuretype <http://cfconventions.org/cf-
> > conventions/cf-conventions.html#_features_and_feature_types <http://cfcon
> >  ventions.org/cf-conventions/cf-conventions.html#_features_
> > and_feature_types> <http://cfconventions.org/cf-
> > conventions/cf-conventions.html#_features_and_feature_types <
> > http://cfconventions.org/cf-conventions/cf-conventions.
> > html#_features_and_feature_types>>> such that the geospatial coordinates
> > of the instances can be point, multi-point, line, multi-line, polygon, or
> > multi-polyg
> > >>>  on geometries. Rather than overload the DSG contiguous ragged array
> > encoding, designed with timeseries in mind, a geometry ragged array
> > encoding is introduced in a new section 9.3.5. See thi
> > >>> >  s google doc for specific proposed changes. <http://goo.gl/Kq9ASq <
> > http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>>>
> > >>> > Motivation
> > >>> >
> > >>> > DSGs have no system to define a geometry (polyline, polygon, etc.,
> > other than point) and an association with a time series that applies over
> > that entire geometry e.g., The expected rainfall in this watershed polygon
> > for some period of time is 10 mm. As suggested in the last paragraph of
> > section 9.1, current practice is to assign a representative point or just
> > use an ID and forgo spatial information within a NetCDF-CF file. In order
> > to satisfy a number of environmental modeling use cases, we need a way to
> > encode a geometry (point, line, polygon, multi-point, multi-line, or
> > multi-polygon) that is the static spatial feature representation to which
> > one or more timeSeries can be associated. In this proposal, we provide an
> > encoding to define collections of simple feature geometries. It interfaces
> > cleanly with the existing DSG specification, enabling DSGs and Simple
> > Geometries to be used concurrently.
> > >>> >
> > >>> > Looking Forward
> > >>> >
> > >>> > This proposal is a compromise solution that attempts to stay
> > consisten to CF ideals and fit within the structure of the existing
> > specification with minimal disruption. Line and polygon data types often
> > require variable length arrays. Development of this proposal has brought to
> > light the need for a general abstraction for variable length arrays in
> > NetCDF-CF. Such a general abstraction would necessarily be reusable for
> > character arrays, ragged arrays of time series, and ragged arrays of
> > geometry nodes, as well as any other ragged data structures that may come
> > up in the future. This proposal does not introduce such a general ragged
> > array abstraction but does not preclude such a development in the future.
> > >>> >
> > >>> > Three Alternative Approaches
> > >>> >
> > >>> > Respecting the human readability ideal of NetCDF-CF, the development
> > of this proposal started from a human readable format for geometries known
> > as Well Known Text <https://en.wikipedia.org/wiki/Well-known_text <
> > https://en.wikipedia.org/wiki/Well-known_text> <https://en.wikipedia.org/
> > wiki/Well-known_text <https://en.wikipedia.org/wiki/Well-known_text>>>.
> > We considered three high level design approaches while developing this
> > proposal.
> > >>> >
> > >>> > Direct use of Well-Known Text (WKT). In this approach, well known
> > text strings would be encoded using character arrays following a contiguous
> > ragged array approach to index the character array by geometry (or instance
> > in DSG parlance).
> > >>> > Implement the WKT approach using a NetCDF binary array. In this
> > approach, well known text separators (brackets, commas and spaces) for
> > multipoint, multiline, multipolygon, and polygon holes, would be encoded as
> > break type separator values like -1 for multiparts and -2 for holes.
> > >>> > Implement the fundamental dimensions of geometry data in NetCDF. In
> > this approach, additional dimensions and variables along those dimensions
> > would be introduced to represent geometries, geometry parts, geometry
> > nodes, and unique (potentially shared) coordinate locations for nodes to
> > reference.
> > >>> > Selected Approach
> > >>> >
> > >>> > The first approach was seen as too opaque to stay true to the CF
> > ideal of complete self-description. The third approach seemed needlessly
> > verbose and difficult to implement. The second approach was selected for
> > the following reasons:
> > >>> >
> > >>> > The second approach is just as or more human-readable than the third.
> > >>> > Use of break values keeps geometries relatively atomic.
> > >>> > Will be familiar to developers who are familiar with the WKT
> > geometry format.
> > >>> > Character arrays, which are needed for options one and three, are
> > cumbersome to use in some programming languages in common use with NetCDF.
> > >>> > Break values replace the need for extraneous variables related to
> > multi-part and polygon holes (interiors). Multi-part geometries are
> > generally an exception and excessive instrumentation to support them should
> > be discounted.
> > >>> > Example: Representation of WKT-Style Polygons in a NetCDF-3
> > timeSeriesfeatureType
> > >>> >
> > >>> > Below is sample CDL demonstrating how polygons are encoded in
> > NetCDF-3 using a continuous ragged array-like encoding. There are three
> > details to note in the example below.
> > >>> >
> > >>> > The attribute contiguous_ragged_dimension with value of a dimension
> > in the file.
> > >>> > The geom_coordinates attribute with a value containing a space
> > separated string of variable names.
> > >>> > The cf_role geometry_x_node and geometry_y_node.
> > >>> > These three attributes form a system to fully describe collections
> > of multi-polygon feature geometries. Any variable that has the
> > continuous_ragged_dimension attribute contains integers that indicate the
> > 0-indexed starting position of each geometry along the instance dimension.
> > Any variable that uses the dimension referenced in the
> > continuous_ragged_dimension attribute can be interpreted using the values
> > in the variable containing the contiguous_ragged_dimension attribute. The
> > variables referenced in the geom_coordinates attribute describe spatial
> > coordinates of geometries. These variables can also be identified by the
> > cf_roles geometry_x_node and geometry_y_node. Note that the example below
> > also includes a mechanism to handle multi-polygon features that also
> > contain holes.
> > >>> >
> > >>> > netcdf multipolygon_example {
> > >>> > dimensions:
> > >>> >   node = 47 ;
> > >>> >   indices = 55 ;
> > >>> >   instance = 3 ;
> > >>> >   time = 5 ;
> > >>> >   strlen = 5 ;
> > >>> > variables:
> > >>> >   char instance_name(instance, strlen) ;
> > >>> >     instance_name:cf_role = "timeseries_id" ;
> > >>> >   int coordinate_index(indices) ;
> > >>> >     coordinate_index:geom_type = "multipolygon" ;
> > >>> >     coordinate_index:geom_coordinates = "x y" ;
> > >>> >     coordinate_index:multipart_break_value = -1 ;
> > >>> >     coordinate_index:hole_break_value = -2 ;
> > >>> >     coordinate_index:outer_ring_order = "anticlockwise" ;
> > >>> >     coordinate_index:closure_convention = "last_node_equals_first" ;
> > >>> >   int coordinate_index_start(instance) ;
> > >>> >     coordinate_index_start:long_name = "index of first coordinate
> > in each instance geometry" ;
> > >>> >     coordinate_index_start:contiguous_ragged_dimension = "indices" ;
> > >>> >   double x(node) ;
> > >>> >     x:units = "degrees_east" ;
> > >>> >     x:standard_name = "longitude" ; // or projection_x_coordinate
> > >>> >     X:cf_role = "geometry_x_node" ;
> > >>> >   double y(node) ;
> > >>> >     y:units = "degrees_north" ;
> > >>> >     y:standard_name = ?latitude? ; // or projection_y_coordinate
> > >>> >     y:cf_role = "geometry_y_node"
> > >>> >   double someVariable(instance) ;
> > >>> >     someVariable:long_name = "a variable describing a single-valued
> > attribute of a polygon" ;
> > >>> >   int time(time) ;
> > >>> >     time:units = "days since 2000-01-01" ;
> > >>> >   double someData(instance, time) ;
> > >>> >     someData:coordinates = "time x y" ;
> > >>> >     someData:featureType = "timeSeries" ;
> > >>> > // global attributes:
> > >>> >     :Conventions = "CF-1.8" ;
> > >>> >
> > >>> > data:
> > >>> >
> > >>> >  instance_name =
> > >>> >   "flash",
> > >>> >   "bang",
> > >>> >   "pow" ;
> > >>> >
> > >>> >  coordinate_index = 0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11,
> > 12, -2, 13, 14, 15, 16,
> > >>> >     -1, 17, 18, 19, 20, -1, 21, 22, 23, 24, 25, 26, 27, 28, -1, 29,
> > 30, 31, 32, 33,
> > >>> >     34, -2, 35, 36, 37, 38, 39, 40, 41, 42, -1, 43, 44, 45, 46 ;
> > >>> >
> > >>> >  coordinate_index_start = 0, 30, 46 ;
> > >>> >
> > >>> >  x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5,
> > 9, 7,
> > >>> >     5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45,
> > -20, -30, -20, -20, -30, 30,
> > >>> >     45, 10, 30, 25, 50, 30, 25 ;
> > >>> >
> > >>> >  y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15,
> > 25, 25, 29,
> > >>> >     25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20,
> > -35, -20, -15, -25, -20, 20,
> > >>> >     40, 40, 20, 5, 10, 15, 5 ;
> > >>> >
> > >>> >  someVariable = 1, 2, 3 ;
> > >>> >
> > >>> >  time = 1, 2, 3, 4, 5 ;
> > >>> >
> > >>> >  someData =
> > >>> >   1, 2, 3, 4, 5,
> > >>> >   1, 2, 3, 4, 5,
> > >>> >   1, 2, 3, 4, 5 ;
> > >>> > }
> > >>> > How To Interpret
> > >>> >
> > >>> > Starting from the timeSeries variables:
> > >>> >
> > >>> > See CF-1.8 conventions.
> > >>> > See the timeSeries featureType.
> > >>> > Find the timeseries_id cf_role.
> > >>> > Find the coordinates attribute of data variables.
> > >>> > See that the variables indicated by the coordinates attribute have a
> > cf_role geometry_x_nodeand geometry_y_node to determine that these are
> > geometries according to this new specification.
> > >>> > Find the coordinate index variable with geom_coordinates that point
> > to the nodes.
> > >>> > Find the variable with contiguous_ragged_dimension pointing to the
> > dimension of the coordinate index variable to determine how to index into
> > the coordinate index.
> > >>> > Iterate over polygons, parsing out geometries using the contiguous
> > ragged start variable and coordinate index variable to interpret the
> > coordinate data variables.
> > >>> > Or, without reference to timeSeries:
> > >>> >
> > >>> > See CF-1.8 conventions.
> > >>> > See the geom_type of multipolygon.
> > >>> > Find the variable with a contiguous_ragged_dimension matching the
> > coordinate index variable?s dimension.
> > >>> > See the geom_coordinates of x y.
> > >>> > Using the contiguous ragged start variable found in 3 and the
> > coordinate index variable found in 2, geometries can be parsed out of the
> > coordinate index variable and parsed using the hole and break values in it.
> > >>> >
> > >>> > -------------- next part --------------
> > >>> > An HTML attachment was scrubbed...
> > >>> > URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/
> > attachments/20170202/4ce5b42f/attachment.html <
> > http://mailman.cgd.ucar.edu/pipermail/cf-metadata/
> > attachments/20170202/4ce5b42f/attachment.html> <
> > http://mailman.cgd.ucar.edu/pipermail/cf-metadata/
> > attachments/20170202/4ce5b42f/attachment.html <
> > http://mailman.cgd.ucar.edu/pipermail/cf-metadata/
> > attachments/20170202/4ce5b42f/attachment.html>>>
> > >>> >
> > >>> > ------------------------------
> > >>> >
> > >>> > Subject: Digest Footer
> > >>> >
> > >>> > _______________________________________________
> > >>> > CF-metadata mailing list
> > >>> > [email protected] <mailto:[email protected]> <mailto:
> > [email protected] <mailto:[email protected]>>
> > >>> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata> <
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>>
> > >>> >
> > >>> >
> > >>> > ------------------------------
> > >>> >
> > >>> > End of CF-metadata Digest, Vol 166, Issue 3
> > >>> > *******************************************
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> > Sincerely,
> > >>> >
> > >>> > Bob Simons
> > >>> > IT Specialist
> > >>> > Environmental Research Division
> > >>> > NOAA Southwest Fisheries Science Center
> > >>> > 99 Pacific St., Suite 255A      (New!)
> > >>> > Monterey, CA 93940               (New!)
> > >>> > Phone: (831)333-9878 <tel:%28831%29333-9878>            (New!)
> > >>> > Fax:   (831)648-8440 <tel:%28831%29648-8440>
> > >>> > Email: [email protected] <mailto:[email protected]> <mailto:
> > [email protected] <mailto:[email protected]>>
> > >>> >
> > >>> > The contents of this message are mine personally and
> > >>> > do not necessarily reflect any position of the
> > >>> > Government or the National Oceanic and Atmospheric Administration.
> > >>> > <>< <>< <>< <>< <>< <>< <>< <>< <><
> > >>> >
> > >>> > _______________________________________________
> > >>> > CF-metadata mailing list
> > >>> > [email protected] <mailto:[email protected]>
> > >>> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
> > >>>
> > >>> -------------- next part --------------
> > >>> An HTML attachment was scrubbed...
> > >>> URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/
> > attachments/20170203/4ff55def/attachment.html <
> > http://mailman.cgd.ucar.edu/pipermail/cf-metadata/
> > attachments/20170203/4ff55def/attachment.html>>
> > >>>
> > >>> ------------------------------
> > >>>
> > >>> Subject: Digest Footer
> > >>>
> > >>> _______________________________________________
> > >>> CF-metadata mailing list
> > >>> [email protected] <mailto:[email protected]>
> > >>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
> > >>>
> > >>>
> > >>> ------------------------------
> > >>>
> > >>> End of CF-metadata Digest, Vol 166, Issue 5
> > >>> *******************************************
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Sincerely,
> > >>>
> > >>> Bob Simons
> > >>> IT Specialist
> > >>> Environmental Research Division
> > >>> NOAA Southwest Fisheries Science Center
> > >>> 99 Pacific St., Suite 255A      (New!)
> > >>> Monterey, CA 93940               (New!)
> > >>> Phone: (831)333-9878 <tel:(831)%20333-9878>            (New!)
> > >>> Fax:   (831)648-8440 <tel:(831)%20648-8440>
> > >>> Email: [email protected] <mailto:[email protected]>
> > >>>
> > >>> The contents of this message are mine personally and
> > >>> do not necessarily reflect any position of the
> > >>> Government or the National Oceanic and Atmospheric Administration.
> > >>> <>< <>< <>< <>< <>< <>< <>< <>< <><
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> CF-metadata mailing list
> > >>> [email protected] <mailto:[email protected]>
> > >>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Christopher Barker, Ph.D.
> > >>> Oceanographer
> > >>>
> > >>> Emergency Response Division
> > >>> NOAA/NOS/OR&R            (206) 526-6959   voice
> > >>> 7600 Sand Point Way NE   (206) 526-6329   fax
> > >>> Seattle, WA  98115       (206) 526-6317   main reception
> > >>>
> > >>> [email protected] <mailto:[email protected]>
> > _______________________________________________
> > >>> CF-metadata mailing list
> > >>> [email protected] <mailto:[email protected]>
> > >>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata <
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
> > >>
> > >
> >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/
> > attachments/20170217/b548709a/attachment.html>
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > CF-metadata mailing list
> > [email protected]
> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> >
> >
> > ------------------------------
> >
> > End of CF-metadata Digest, Vol 166, Issue 15
> > ********************************************
> >
> 
> 
> 
> -- 
> Sincerely,
> 
> Bob Simons
> IT Specialist
> Environmental Research Division
> NOAA Southwest Fisheries Science Center
> 99 Pacific St., Suite 255A      (New!)
> Monterey, CA 93940               (New!)
> Phone: (831)333-9878            (New!)
> Fax:   (831)648-8440
> Email: [email protected]
> 
> The contents of this message are mine personally and
> do not necessarily reflect any position of the
> Government or the National Oceanic and Atmospheric Administration.
> <>< <>< <>< <>< <>< <>< <>< <>< <><

> _______________________________________________
> CF-metadata mailing list
> [email protected]
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


----- End forwarded message -----
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Re: [CF-metadata] Pre-proposal for "charset"

Reply via email to