Re: [CF-metadata] Extension of Discrete Sampling Geometries for Simple Features

David Blodgett Fri, 17 Feb 2017 11:23:04 -0800

All,

I haven’t heard much follow up, but here’s a doodle to coordinate a phone 
conversation about this. I think we have west-coast US participants and EU 
participants, so I chose times mid to late morning for me (midwest US).


http://doodle.com/poll/eikarnt35tdm7igd 
<http://doodle.com/poll/eikarnt35tdm7igd>

Will make a call once a few people have expressed interest and we have a clear 
day/time.

Regards,

- Dave

> On Feb 6, 2017, at 11:29 AM, David Blodgett <[email protected]> wrote:
> 
> Dear CF, 
> 
> I want to follow up on the conversation here with an alternative approach 
> suggested off list primarily between Jonathan and I. For this, I’m going to 
> focus on use cases satisfied and simplification of the proposal allowed by 
> not supporting those use cases. The changes below are largely driven by a 
> desire to better align this proposal with the technical details of the prior 
> art that is CF. 
> 
> If we: 
> 1) don’t support node sharing, we can remove the complication of node - 
> coordinate indexing / indirection, simplifying the proposal pretty 
> significantly.
> 2) don’t use “break values” to indicate the separation between multi-part 
> geometries and polygon holes, we end up with a data model with an extra 
> dimension, but the NetCDF dimensions align with the natural dimensions of the 
> data.
> 3) use “count” instead of a “start pointer” approach, we are better aligned 
> with the existing DSG contiguous ragged array approach.
> 
> Coming back to the three directions we could take this proposal from my cover 
> letter on February 2nd.
>> Direct use of Well-Known Text (WKT). In this approach, well known text 
>> strings would be encoded using character arrays following a contiguous 
>> ragged array approach to index the character array by geometry (or instance 
>> in DSG parlance).
>> Implement the WKT approach using a NetCDF binary array. In this approach, 
>> well known text separators (brackets, commas and spaces) for multipoint, 
>> multiline, multipolygon, and polygon holes, would be encoded as break type 
>> separator values like -1 for multiparts and -2 for holes.
>> Implement the fundamental dimensions of geometry data in NetCDF. In this 
>> approach, additional dimensions and variables along those dimensions would 
>> be introduced to represent geometries, geometry parts, geometry nodes, and 
>> unique (potentially shared) coordinate locations for nodes to reference.
> The alternative I’m outlining here moves in the direction of 3. We had 
> originally discounted it because it becomes very verbose and seems overly 
> complicated if support for coordinate sharing is a requirement. If the three 
> simplifications described above are used, then the third approach seems more 
> tenable. 
> 
> Jonathan has also suggested that: (these are in reaction to the CDL in my 
> letter from February 2nd)
> 1) Rename geom_coordinates as node_coordinates, for consistency with UGRID.
> 2) Omit node_dimension. This is redundant, since the dimension can be found by
> examining the node coordinate variables.
> 3) Prescribe numerous “codes” and assumptions in the specification instead of 
> letting them be described with attribute values.
> 4) It would be more consistent with CF and UGRID to use a single container 
> variable to hang all the topology/geometry information from.
> 
> Which I, personally, am happy to accept if others don’t object.
> 
> A couple other suggestions from Jonathan I want to discuss a bit more:
> 1) Rename geometry as topology and geom_type as topology_type.
>       While I’d be open to something other than geom, topology is odd. If 
> this is really “node_collection_topology_type” I guess I could be convinced, 
> but would be curious how people react to this. (Especially in relation to 
> UGRID)
> 2) This extension is more appropriate as an extension to the concept of cell 
> bounds than the addition of a complex time-invariate type of discrete 
> sampling geometry. 
>       Having just re-read the cell bounds chapter, I think it would over 
> complicate the cell bounds to include this material. My basic issue here is 
> that these geometries do not necessarily have a reference location. They are, 
> rather, first order entities that need to be treated as such. That said, it 
> makes sense that these geometries are not necessarily a good fit for the 
> original intent of Discrete Sampling Geometries. Jonathan suggested they may 
> belong in their own chapter, which may be a good alternative? MY suggested 
> CDL below might lead us in the direction of this being a special type of 
> auxiliary coordinate variable. 
> 
> This alternative starts to look like the CDL pasted below.
> 
> Note that the issue of coordinates is sticking out like a sore thumb. Below, 
> I’ve attempted to reconcile Jonathan’s ideas regarding coordinates with my 
> thoughts about how these geometries are “first order entities” that don’t 
> have a single representative x and y. The spatial coordinates can be said to 
> reside in the system of geometries described in the “sf” container variable? 
> I realize this goes against the idea of coordinates a bit, but I think it is 
> holding with the spirit of the attribute?
> 
> Finally, I’m glad to continue answering questions and debating things via the 
> list to a point, but I think it would be in our interest to arrange a telecom 
> to discuss this stuff further with a list of interested parties. Feel free to 
> follow up on list, but for decision making, let’s not let this rabbit hole go 
> too deep. I’ll plan on letting this and the other recent action on this 
> proposal settle with people for a week or two then start to bring together a 
> conference call (or calls depending on time zones). Please respond to me off 
> list if you are interested in being part of a call to discuss.
> 
> Regards,
> 
> - Dave 
> 
> netcdf multipolygon_example {
> dimensions:
>  node = 47 ;
>  part = 9 ;
>  instance = 3 ;
>  time = 5 ;
>  strlen = 5 ;
> variables:
>  char instance_name(instance, strlen) ;
>    instance_name:cf_role = "timeseries_id" ;
>  double someVariable(instance) ;
>    someVariable:long_name = "a variable describing a single-valued attribute 
> of a polygon" ;
>    someVariable:coordinates = "sf" ; // or "instance_name"?
>  int time(time) ;
>    time:units = "days since 2000-01-01" ;
>  double someData(instance, time) ;
>    someData:coordinates = "time sf" ; // or "time instance_name"?
>    someData:featureType = "timeSeries" ;
>    someData:geometry="sf";
>  int sf; // containing variable -- datatype irrelevant because no data
>    sf:geom_type = "multipolygon" ; // could be node_topology_type?
>    sf:node_count_variable="node_count";
>    sf:node_coordinates = "x y" ;
>    sf:part_count = "part_node_count" ;
>    sf:part_type = "part_type" ; // Note required unless polygons with holes 
> present.
>    sf:outer_ring_order = "anticlockwise" ; // not required if written in spec?
>    sf:closure_convention = "last_node_equals_first" ; // not required if 
> written in spec?
>    sf:outer_type_code = 0 ; // not required if written in spec?
>    sf:inner_type_code = 1 ; // not required if written in spec?
>  int node_count(instance);
>    node_count:long_name = “count of coordinates in each instance geometry" ;
>  int part_node_count(part) ;
>    part_node_count:long_name = “count of coordinates in each geometry part" ;
>  int part_type(part) ;
>    part_type:long_name = “type of each geometry part" ;
>  double x(node) ;
>    x:units = "degrees_east" ;
>    x:standard_name = "longitude" ; // or projection_x_coordinate
>    X:cf_role = "geometry_x_node" ;
>  double y(node) ;
>    y:units = "degrees_north" ;
>    y:standard_name = “latitude” ; // or projection_y_coordinate
>    y:cf_role = "geometry_y_node"
> // global attributes:
>     :Conventions = "CF-1.8" ;
> 
> data:
> 
>  instance_name =
>   "flash",
>   "bang",
>   "pow" ;
> 
>  someVariable = 1, 2, 3 ;
> 
>  time = 1, 2, 3, 4, 5 ;
> 
>  someData =
>   1, 2, 3, 4, 5,
>   1, 2, 3, 4, 5,
>   1, 2, 3, 4, 5 ;
> 
>  node_count = 25, 15, 7 ;
> 
>  part_node_count = 5, 4, 4, 4, 4, 8, 6, 8, 4 ;
> 
>  part_type = 0, 1, 1, 1, 0, 0, 0, 1, 0 ;
> 
>  x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
>     5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45, -20, -30, 
> -20, -20, -30, 30, 
>     45, 10, 30, 25, 50, 30, 25 ;
> 
>  y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29, 
>     25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20, -35, -20, 
> -15, -25, -20, 20,
>     40, 40, 20, 5, 10, 15, 5 ;
> }
> 
> 
> 
>> On Feb 4, 2017, at 8:07 AM, David Blodgett <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Dear Chris, 
>> 
>> Thanks for your thorough treatment of these issues. We have gone through a 
>> similar thought process to arrive at the proposal we came up with. I’ll 
>> answer as briefly as I can.
>> 
>> 1) how would you translate between netcdf geometries and, say geo JSON?
>> 
>> The thinking is that node coordinate sharing is optional. If the writer 
>> wants to check or already knows that nodes share coordinates, then it’s 
>> possible. Otherwise, it doesn’t have to be used. I’ve always felt that this 
>> was important, but maybe not critical for a core NetCDF-CF data model. Some 
>> offline conversation has led to an example that does not use it that may be 
>> a good alternative, more on that later.
>> 
>> 2) Break Values
>> 
>> You really do have to hold your nose on the break values. The issue is that 
>> you have to store that information somehow and it is almost worse to create 
>> new variables to store the multi-part and hole/not hole information. The 
>> alternative approach that’s forming up as mentioned above does break the 
>> information out into additional variables but simplifies things otherwise. 
>> In that case it doesn’t feel overly complex to me… so stay tuned for more on 
>> this front.
>> 
>> 3) Ragged Indexing
>> 
>> Your thought process follows ours exactly. The key is that you either have 
>> to create the “pointer” array as a first order of business or loop over the 
>> counts ad nauseam. I’m actually leaning toward the counts for two reasons. 
>> First, the counts approach is already in CF so is a natural fit and will be 
>> familiar to developers in this space. Second, the issue of 0 vs 1 indexing 
>> is annoying. In our proposal, we settled on 0 indexing because it aligns 
>> with the idea of an offset, but it is still annoying and some applications 
>> would always have to adjust that pointer array as a first order of business. 
>> 
>> On to Bob’s comments.
>> 
>> Regarding aligning with other data models / encodings, I guess this needs to 
>> be unpacked a bit. 
>> 
>> 1) In this setting, simple features is a data model, not an encoding. An 
>> encoding can implement part or all of a data model as is needed by the use 
>> case(s) at hand. There is no problem with partial implementations you still 
>> get interoperability for the intended use cases.
>> 2) Attempting to align with other encoding standards UGRID and NetCDF-CF are 
>> the primary ones here, is simply to keep the implementation patterns similar 
>> and familiar. This may be a fools errand, but is presumably good for 
>> adoptability and consistency. 
>> So, I don’t see a problem with implementing important simple features types 
>> in a way that aligns with the way the existing community standards work.
>> 
>> I don’t see this as ignoring existing standards at all. There is no open 
>> community standard for binary encoding of geometries and related data that 
>> passes the CF requirements of human readability and self-description. We are 
>> adopting the appropriate data model and suggesting a new encoding that will 
>> solve a lot of problems in the environmental modeling space. 
>> 
>> As we’ve discussed before, your "different approach” sounds great, but seems 
>> like an exercise for a future effort that doesn’t attempt to align with CF 
>> 1.7. Maybe what you suggest is a path forward for variable length arrays in 
>> the CF 2.0 “vision in the mist”, but I don’t see it as a tenable solution 
>> for CF 1.*.
>> 
>> Best Regards,
>> 
>> - Dave
>> 
>> 
>>> On Feb 3, 2017, at 3:31 PM, Chris Barker <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> a few thoughts. First, I think there are three core "issues" that need to 
>>> be resolved:
>>> 
>>> 1) Coordinate indexing (indirection)
>>> 
>>> the question of whether you have an array of "vertices" that the geomotry 
>>> types index into to get thier data:
>>> 
>>> Advantages:
>>>  - if a number of geometries share a lot of vertices, it can be more 
>>> efficient
>>>  - the relationship between geometries that share vertices (i.e. polygons 
>>> that share a boundary) etc. is well defined. you dopnt need to check for 
>>> closeness, and maybe have a tolerance, etc.
>>> 
>>> These were absolutely critical for UGRID for example -- a UGRID mesh is a 
>>> single thing", NOT a collection of polygons that happen to share some 
>>> vertices.
>>> 
>>> Disadvantages:
>>>  -  if the geometries do not share many vertices, it is less efficient.
>>>  -  there are additional code complications in "getting" the vertices of 
>>> the given geometry
>>>  - it does not match the OGC data model.
>>> 
>>> My 0.02 -- given my use cases, I tend to want teh advantages -- but I don't 
>>> know that that's a typical use case. And I think it's a really good idea to 
>>> keep with the OGS data model where possible -- i.e. e able to translate 
>>> from netcdf to, say, geoJSON as losslessly as possible. Given that I think 
>>> it's probably a better idea not to have the indirection.
>>> 
>>> However (to equivocate) perhaps the types of information people are likely 
>>> to want to store in netcdf are a subset of what the OGC standards are 
>>> designed for -- and for those use-cases, maybe shared vertices are critical.
>>> 
>>> One way to think about it -- how would you translate between netcdf 
>>> geometries and, say geo JSON:
>>>   - nc => geojson would lose the shared index info.
>>>   - geojson => nc -- would you try to reconstruct the shared vertices?? I"m 
>>> thinking that would be a bit dangerous in the general case, because you are 
>>> adding information that you don't know is true -- are these a shared vertex 
>>> or two that just happen to be at the same location?
>>> 
>>> > > Break values
>>> 
>>> I don't really like break values as an approach, but with netcdf any option 
>>> will be ugly one way or another. So keeping with the WKT approach makes 
>>> sense to me. Either way you'll need custom code to unpack it. (BTW -- what 
>>> does WellKnownBinary do?)
>>> 
>>> > > Ragged indexing
>>> 
>>> There are two "natural" ways to represent a ragged array:
>>> 
>>> (a) store the length of each "row"
>>> (b) store the index to the beginning (or end) or each "row"
>>> 
>>> CF already uses (a). However, working with it, I'm pretty convinced that 
>>> it's the "wrong" choice:
>>> 
>>> If you want to know how long a given row is, that is really easy with (a), 
>>> and almost as easy with (b) (involves two indexes and a subtraction)
>>> 
>>> However, if you want to extract a particular row: (b) makes this really 
>>> easy -- you simply access the slice of the array you want. with (a) you 
>>> need to loop through the entire "length_of_rows" array (up to the row of 
>>> interest) and add up the values to find the slice you need. not a huge 
>>> issue, but it is an issue. In fact, in my code to read ragged arrays in 
>>> netcdf, the first thing I do is pre-compute the index-to-each-row, so I can 
>>> then use that to access individual rows for future access -- if  you are 
>>> accessing via OpenDAP -- that's particular helpful.
>>> 
>>> So -- (b) is clearly (to me) the "best" way to do it -- but is it worth 
>>> introducing a second way to handle ragged arrays in CF? I would think yes, 
>>> but that would be offset if:
>>> 
>>>  - There is a bunch of existing library code that transparently handles 
>>> ragged arrays in netcdf (does netcdfJava have something? I'm pretty sure 
>>> Python doesn't -- certainly not in netCDF4) 
>>> 
>>>  - That that existing lib code would be advantageous to leverage for code 
>>> reading features: I suspect that there will have to be enough custom code 
>>> that the ragged array bits are going to be the least of it.
>>> 
>>> So I'm for the "new" way of representing ragged arrays
>>> 
>>> -CHB
>>> 
>>> 
>>> On Fri, Feb 3, 2017 at 11:41 AM, Bob Simons - NOAA Federal 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>> Then, isn't this proposal just the first step in the creation of a new 
>>> model and a new encoding of Simple Features, one that is "align[ed] ... 
>>> with as many other encoding standards in this space as is practical"? In 
>>> other words, yet another standard for Simple Features?
>>> 
>>> If so, it seems risky to me to take just the first (easy?) step "to support 
>>> the use cases that have a compelling need today" and not solve the entire 
>>> problem. I know the CF way is to just solve real, current needs, but in 
>>> this case it seems to risk a head slap moment in the future when we realize 
>>> that, in order to deal with some new simple feature variant, we should have 
>>> done things differently from the beginning?
>>> 
>>> And it seems odd to reject existing standards that have been so 
>>> painstakingly hammered out, in favor of starting the process all over 
>>> again.  We follow existing standards for other things (e.g., IEEE-754 for 
>>> representing floating point numbers in binary files), why can't we follow 
>>> an existing Simple Features standard?
>>> 
>>> ---
>>> Rather than just be a naysayer, let me suggest a very different alternative:
>>> 
>>> There are several projects in the CF realm (e.g., this Simple Features 
>>> project, Discrete Sampling Geometry (DSG), true variable-length Strings, 
>>> ugrid(?)) which share a common underlying problem: how to deal with 
>>> variable-length multidimensional arrays: a[b][c], where the length of the c 
>>> dimension may be different for different b indices.
>>> DSG solved this (5 different ways!), but only for DSG.
>>> The Simple Features proposal seeks to solve the problem for Simple Features.
>>> We still have no support for Unicode variable-length Strings.
>>> 
>>> Instead of continuing to solve the variable-length problem a different way 
>>> every time we confront it, shouldn't we solve it once, with one small 
>>> addition to the standard, and then use that solution repeatedly?
>>> The solution could be a simple variant of one of the DSG solutions, but 
>>> generalized so that it could be used in different situations.
>>> An encoding standard and built-in support for variable-length data arrays 
>>> in netcdf-java/c would solve a lot of problems, now and in the future.
>>> Some work on this is already done: I think the netcdf-java API already 
>>> supports variable-length arrays when reading netcdf-4 files.
>>> For Simple Features, the problem would reduce to: store the feature (using 
>>> some specified existing standard like WKT or WKB) in a variable-length 
>>> array. 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Feb 3, 2017 at 9:07 AM, <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Date: Fri, 3 Feb 2017 11:07:00 -0600
>>> From: David Blodgett <[email protected] <mailto:[email protected]>>
>>> To: Bob Simons - NOAA Federal <[email protected] 
>>> <mailto:[email protected]>>
>>> Cc: CF Metadata <[email protected] <mailto:[email protected]>>
>>> Subject: Re: [CF-metadata] Extension of Discrete Sampling Geometries
>>>         for Simple Features
>>> Message-ID: <[email protected] 
>>> <mailto:[email protected]>>
>>> Content-Type: text/plain; charset="utf-8"
>>> 
>>> Dear Bob,
>>> 
>>> I?ll just take these in line.
>>> 
>>> 1) noted. We have been trying to figure out what to do with the point 
>>> featureType and I think leaving it more or less alone is a viable path 
>>> forward.
>>> 
>>> 2) This is not an exact replica of WKT, but rather a similar approach to 
>>> WKT. As I stated, we have followed the ISO simple features data model and 
>>> well known text feature types in concept, but have not used the same 
>>> standardization formalisms. We aren?t advocating for supporting ?all of? 
>>> any standard but are rather attempting to support the use cases that have a 
>>> compelling need today while aligning this with as many other encoding 
>>> standards in this space as is practical. Hopefully that answers your 
>>> question, sorry if it?s vague.
>>> 
>>> 3) The google doc linked in my response contains the encoding we are 
>>> proposing as a starting point for conversation: http://goo.gl/Kq9ASq 
>>> <http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>> I want 
>>> to stress, as a starting point for discussion. I expect that this proposal 
>>> will change drastically before we?re done.
>>> 
>>> 4) Absolutely envision tools doing what you say, convert to/from standard 
>>> spatial formats and NetCDF-CF geometries. We intend to introduce an R and a 
>>> Python implementation that does exactly as you say along with whatever form 
>>> this standard takes in the end. R and Python were chosen as the team that 
>>> brought this together are familiar with those two languages, additional 
>>> implementations would be more than welcome.
>>> 
>>> 5) We do include a ?geometry? featureType similar to the ?point? 
>>> featureType. Thus our difficulty with what to do with the ?point? 
>>> featureType. You are correct, there are lots of non timeSeries applications 
>>> to be solved and this proposal does intend to support them (within the 
>>> existing DSG constructs).
>>> 
>>> Thanks for your questions, hopefully my answers close some gaps for you.
>>> 
>>> - Dave
>>> 
>>> > On Feb 3, 2017, at 10:47 AM, Bob Simons - NOAA Federal 
>>> > <[email protected] <mailto:[email protected]>> wrote:
>>> >
>>> > 1) There is a vague comment in the proposal about possibly changing the 
>>> > point featureType. Please don't, unless the changes don't affect current 
>>> > uses of Point. There are already 1000's of files that use it. If this new 
>>> > system offers an alternative, then fine, it's an alternative. One of the 
>>> > most important and useful features of a good standard is backwards 
>>> > compatibility.
>>> >
>>> > 2) You advocate "Implement the WKT approach using a NetCDF binary array." 
>>> > Is this system then an exact encoding of WKT, neither a subset nor a 
>>> > superset?  "Simple Features" are often not simple.
>>> > If it is WKT (or something else), what is the standard you are following 
>>> > to describe the Simple Features (e.g.,  ISO/IEC 13249-3:2016 and ISO 
>>> > 19162:2015)?
>>> > Does your proposal deviate in any way from the standard's capabilities?
>>> > Do you advocate following the entire WKT standard, e.g., supporting all 
>>> > the feature types that WKT supports?
>>> >
>>> > 3) Since you are not using the WKT encoding, but creating your own, where 
>>> > is the definition of the encoding system you are using?
>>> >
>>> > 4) This is a little out of CF scope, but:
>>> > Do you envision tools, notably, netcdf-c/java, having a writer function 
>>> > that takes in WKT and encodes the information in a file, and having a 
>>> > reader function that reads the file and returns WKT? Or is it your plan 
>>> > that the encoding/ decoding is left to the user?
>>> >
>>> > 5) This proposal is for "Simple Features plus Time Series" (my phrase not 
>>> > yours). But aren't there lots of other uses of Simple Features? Will 
>>> > there be other proposals in the future for "Simple Features plus X" and 
>>> > "Simple Features plus Y"? If so, will CF eventually become a massive 
>>> > document where Simple Features are defined over and over again, but in 
>>> > different contexts? If so, wouldn't a better solution be to deal with 
>>> > Simple Features separately (as Postgres does by making a geometric data 
>>> > type?), and then add "Simple Features plus Time Series" as the first use 
>>> > of it?
>>> >
>>> > Thanks for answering these questions.
>>> > Please forgive me if I missed parts of your proposal that answer these 
>>> > questions.
>>> >
>>> >
>>> > On Thu, Feb 2, 2017 at 5:57 AM, <[email protected] 
>>> > <mailto:[email protected]> 
>>> > <mailto:[email protected] 
>>> > <mailto:[email protected]>>> wrote:
>>> > Date: Thu, 2 Feb 2017 07:57:36 -0600
>>> > From: David Blodgett <[email protected] <mailto:[email protected]> 
>>> > <mailto:[email protected] <mailto:[email protected]>>>
>>> > To: <[email protected] <mailto:[email protected]> 
>>> > <mailto:[email protected] <mailto:[email protected]>>>
>>> > Subject: [CF-metadata] Extension of Discrete Sampling Geometries for
>>> >         Simple  Features
>>> > Message-ID: <[email protected] 
>>> > <mailto:[email protected]> 
>>> > <mailto:[email protected] 
>>> > <mailto:[email protected]>>>
>>> > Content-Type: text/plain; charset="utf-8"
>>> >
>>> > Dear CF Community,
>>> >
>>> > We are pleased to submit this proposal for your consideration and review. 
>>> > The cover letter we've prepared below provides some background and 
>>> > explanation for the proposed approach. The google doc here 
>>> > <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq 
>>> > <http://goo.gl/Kq9ASq>>> is an excerpt of the CF specification with track 
>>> > changes turned on. Permissions for the document allow any google user to 
>>> > comment, so feel free to comment and ask questions in line.
>>> >
>>> > Note that I?m sharing this with you with one issue unresolved. What to do 
>>> > with the point featureType? Our draft suggests that it is part of a new 
>>> > geometry featureType, but it could be that we leave it alone and 
>>> > introduce a geometry featureType. This may be a minor point of 
>>> > discussion, but we need to be clear that this is an issue that still 
>>> > needs to be resolved in the proposal.
>>> >
>>> > Thank you for your time and consideration.
>>> >
>>> > Best Regards,
>>> >
>>> > David Blodgett, Tim Whiteaker, and Ben Koziol
>>> >
>>> > Proposed Extension to NetCDF-CF for Simple Geometries
>>> >
>>> > Preface
>>> >
>>> > The proposed addition to NetCDF-CF introduced below is inspired by a 
>>> > pre-existing data model governed by OGC and ISO as ISO 19125-1. More 
>>> > information on Simple Features may be found here. 
>>> > <https://en.wikipedia.org/wiki/Simple_Features 
>>> > <https://en.wikipedia.org/wiki/Simple_Features> 
>>> > <https://en.wikipedia.org/wiki/Simple_Features 
>>> > <https://en.wikipedia.org/wiki/Simple_Features>>> To the knowledge of the 
>>> > authors, it is consistent with ISO 19125-1 but has not been specified 
>>> > using the formalisms of OGC or ISO. Language used attempts to hold true 
>>> > to NetCDF-CF semantics while not conflicting with the existing standards 
>>> > baseline. While this proposal does not support the entire scope of the 
>>> > the simple features ecosystem, it does support the core data types in 
>>> > most common use around the community.
>>> >
>>> > The other existing standard to mention is UGRID convention 
>>> > <http://ugrid-conventions.github.io/ugrid-conventions/ 
>>> > <http://ugrid-conventions.github.io/ugrid-conventions/> 
>>> > <http://ugrid-conventions.github.io/ugrid-conventions/ 
>>> > <http://ugrid-conventions.github.io/ugrid-conventions/>>>. The authors 
>>> > have experience reading and writing UGRID and have designed the proposed 
>>> > structure in a way that is inspired by and consistent with it.
>>> >
>>> > Terms and Definitions
>>> >
>>> > (Taken from OGC 06-103r4 OpenGIS Implementation Specification for 
>>> > Geographic information - Simple feature access - Part 1: Common 
>>> > architecture <http://www.opengeospatial.org/standards/sfa 
>>> > <http://www.opengeospatial.org/standards/sfa> 
>>> > <http://www.opengeospatial.org/standards/sfa 
>>> > <http://www.opengeospatial.org/standards/sfa>>>.)
>>> >
>>> > Feature: Abstraction of real world phenomena - typically a geospatial 
>>> > abstraction with associated descriptive attributes.
>>> > Simple Feature: A feature with all geometric attributes described 
>>> > piecewise by straight line or planar interpolation between point sets.
>>> > Geometry (geometric complex): A set of disjoint geometric primitives - 
>>> > one or more points, lines, or polygons that form the spatial 
>>> > representation of a feature.
>>> > Introduction
>>> >
>>> > Discrete Sampling Geometries (DSGs) handle data from one (or a collection 
>>> > of) timeSeries (point), Trajectory, Profile, TrajectoryProfile or 
>>> > timeSeriesProfile geometries. Measurements are from a point (timeSeries 
>>> > and Profile) or points along a trajectory. In this proposal, we reuse the 
>>> > core DSG timeSeries type which provides support for basic time series use 
>>> > cases e.g., a timeSerieswhich is measured (or modeled) at a given point.
>>> >
>>> > Changes to Existing CF Specification
>>> >
>>> > In NetCDF-CF 1.7, Discrete Sampling Geometries separate dimensions and 
>>> > variables into two types ? instance and element 
>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements
>>> >  
>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements>
>>> >  
>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements
>>> >  
>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements>>>.
>>> >  Instance refers to individual points, trajectories, profiles, etc. These 
>>> > would sometimes be referred to as features given that they are identified 
>>> > entities that can have associated attributes and be related to other 
>>> > entities. Element dimensions describe temporal or other dimensions to 
>>> > describe data on a per-instance basis. This proposal extends the DSG 
>>> > timeSeries featuretype 
>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types
>>> >  
>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types>
>>> >  
>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types
>>> >  
>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types>>>
>>> >  such that the geospatial coordinates of the instances can be point, 
>>> > multi-point, line, multi-line, polygon, or multi-polyg
>>>  on geometries. Rather than overload the DSG contiguous ragged array 
>>> encoding, designed with timeseries in mind, a geometry ragged array 
>>> encoding is introduced in a new section 9.3.5. See thi
>>> >  s google doc for specific proposed changes. <http://goo.gl/Kq9ASq 
>>> > <http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>>>
>>> > Motivation
>>> >
>>> > DSGs have no system to define a geometry (polyline, polygon, etc., other 
>>> > than point) and an association with a time series that applies over that 
>>> > entire geometry e.g., The expected rainfall in this watershed polygon for 
>>> > some period of time is 10 mm. As suggested in the last paragraph of 
>>> > section 9.1, current practice is to assign a representative point or just 
>>> > use an ID and forgo spatial information within a NetCDF-CF file. In order 
>>> > to satisfy a number of environmental modeling use cases, we need a way to 
>>> > encode a geometry (point, line, polygon, multi-point, multi-line, or 
>>> > multi-polygon) that is the static spatial feature representation to which 
>>> > one or more timeSeries can be associated. In this proposal, we provide an 
>>> > encoding to define collections of simple feature geometries. It 
>>> > interfaces cleanly with the existing DSG specification, enabling DSGs and 
>>> > Simple Geometries to be used concurrently.
>>> >
>>> > Looking Forward
>>> >
>>> > This proposal is a compromise solution that attempts to stay consisten to 
>>> > CF ideals and fit within the structure of the existing specification with 
>>> > minimal disruption. Line and polygon data types often require variable 
>>> > length arrays. Development of this proposal has brought to light the need 
>>> > for a general abstraction for variable length arrays in NetCDF-CF. Such a 
>>> > general abstraction would necessarily be reusable for character arrays, 
>>> > ragged arrays of time series, and ragged arrays of geometry nodes, as 
>>> > well as any other ragged data structures that may come up in the future. 
>>> > This proposal does not introduce such a general ragged array abstraction 
>>> > but does not preclude such a development in the future.
>>> >
>>> > Three Alternative Approaches
>>> >
>>> > Respecting the human readability ideal of NetCDF-CF, the development of 
>>> > this proposal started from a human readable format for geometries known 
>>> > as Well Known Text <https://en.wikipedia.org/wiki/Well-known_text 
>>> > <https://en.wikipedia.org/wiki/Well-known_text> 
>>> > <https://en.wikipedia.org/wiki/Well-known_text 
>>> > <https://en.wikipedia.org/wiki/Well-known_text>>>. We considered three 
>>> > high level design approaches while developing this proposal.
>>> >
>>> > Direct use of Well-Known Text (WKT). In this approach, well known text 
>>> > strings would be encoded using character arrays following a contiguous 
>>> > ragged array approach to index the character array by geometry (or 
>>> > instance in DSG parlance).
>>> > Implement the WKT approach using a NetCDF binary array. In this approach, 
>>> > well known text separators (brackets, commas and spaces) for multipoint, 
>>> > multiline, multipolygon, and polygon holes, would be encoded as break 
>>> > type separator values like -1 for multiparts and -2 for holes.
>>> > Implement the fundamental dimensions of geometry data in NetCDF. In this 
>>> > approach, additional dimensions and variables along those dimensions 
>>> > would be introduced to represent geometries, geometry parts, geometry 
>>> > nodes, and unique (potentially shared) coordinate locations for nodes to 
>>> > reference.
>>> > Selected Approach
>>> >
>>> > The first approach was seen as too opaque to stay true to the CF ideal of 
>>> > complete self-description. The third approach seemed needlessly verbose 
>>> > and difficult to implement. The second approach was selected for the 
>>> > following reasons:
>>> >
>>> > The second approach is just as or more human-readable than the third.
>>> > Use of break values keeps geometries relatively atomic.
>>> > Will be familiar to developers who are familiar with the WKT geometry 
>>> > format.
>>> > Character arrays, which are needed for options one and three, are 
>>> > cumbersome to use in some programming languages in common use with NetCDF.
>>> > Break values replace the need for extraneous variables related to 
>>> > multi-part and polygon holes (interiors). Multi-part geometries are 
>>> > generally an exception and excessive instrumentation to support them 
>>> > should be discounted.
>>> > Example: Representation of WKT-Style Polygons in a NetCDF-3 
>>> > timeSeriesfeatureType
>>> >
>>> > Below is sample CDL demonstrating how polygons are encoded in NetCDF-3 
>>> > using a continuous ragged array-like encoding. There are three details to 
>>> > note in the example below.
>>> >
>>> > The attribute contiguous_ragged_dimension with value of a dimension in 
>>> > the file.
>>> > The geom_coordinates attribute with a value containing a space separated 
>>> > string of variable names.
>>> > The cf_role geometry_x_node and geometry_y_node.
>>> > These three attributes form a system to fully describe collections of 
>>> > multi-polygon feature geometries. Any variable that has the 
>>> > continuous_ragged_dimension attribute contains integers that indicate the 
>>> > 0-indexed starting position of each geometry along the instance 
>>> > dimension. Any variable that uses the dimension referenced in the 
>>> > continuous_ragged_dimension attribute can be interpreted using the values 
>>> > in the variable containing the contiguous_ragged_dimension attribute. The 
>>> > variables referenced in the geom_coordinates attribute describe spatial 
>>> > coordinates of geometries. These variables can also be identified by the 
>>> > cf_roles geometry_x_node and geometry_y_node. Note that the example below 
>>> > also includes a mechanism to handle multi-polygon features that also 
>>> > contain holes.
>>> >
>>> > netcdf multipolygon_example {
>>> > dimensions:
>>> >   node = 47 ;
>>> >   indices = 55 ;
>>> >   instance = 3 ;
>>> >   time = 5 ;
>>> >   strlen = 5 ;
>>> > variables:
>>> >   char instance_name(instance, strlen) ;
>>> >     instance_name:cf_role = "timeseries_id" ;
>>> >   int coordinate_index(indices) ;
>>> >     coordinate_index:geom_type = "multipolygon" ;
>>> >     coordinate_index:geom_coordinates = "x y" ;
>>> >     coordinate_index:multipart_break_value = -1 ;
>>> >     coordinate_index:hole_break_value = -2 ;
>>> >     coordinate_index:outer_ring_order = "anticlockwise" ;
>>> >     coordinate_index:closure_convention = "last_node_equals_first" ;
>>> >   int coordinate_index_start(instance) ;
>>> >     coordinate_index_start:long_name = "index of first coordinate in each 
>>> > instance geometry" ;
>>> >     coordinate_index_start:contiguous_ragged_dimension = "indices" ;
>>> >   double x(node) ;
>>> >     x:units = "degrees_east" ;
>>> >     x:standard_name = "longitude" ; // or projection_x_coordinate
>>> >     X:cf_role = "geometry_x_node" ;
>>> >   double y(node) ;
>>> >     y:units = "degrees_north" ;
>>> >     y:standard_name = ?latitude? ; // or projection_y_coordinate
>>> >     y:cf_role = "geometry_y_node"
>>> >   double someVariable(instance) ;
>>> >     someVariable:long_name = "a variable describing a single-valued 
>>> > attribute of a polygon" ;
>>> >   int time(time) ;
>>> >     time:units = "days since 2000-01-01" ;
>>> >   double someData(instance, time) ;
>>> >     someData:coordinates = "time x y" ;
>>> >     someData:featureType = "timeSeries" ;
>>> > // global attributes:
>>> >     :Conventions = "CF-1.8" ;
>>> >
>>> > data:
>>> >
>>> >  instance_name =
>>> >   "flash",
>>> >   "bang",
>>> >   "pow" ;
>>> >
>>> >  coordinate_index = 0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12, -2, 
>>> > 13, 14, 15, 16,
>>> >     -1, 17, 18, 19, 20, -1, 21, 22, 23, 24, 25, 26, 27, 28, -1, 29, 30, 
>>> > 31, 32, 33,
>>> >     34, -2, 35, 36, 37, 38, 39, 40, 41, 42, -1, 43, 44, 45, 46 ;
>>> >
>>> >  coordinate_index_start = 0, 30, 46 ;
>>> >
>>> >  x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
>>> >     5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45, -20, 
>>> > -30, -20, -20, -30, 30,
>>> >     45, 10, 30, 25, 50, 30, 25 ;
>>> >
>>> >  y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 
>>> > 29,
>>> >     25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20, -35, 
>>> > -20, -15, -25, -20, 20,
>>> >     40, 40, 20, 5, 10, 15, 5 ;
>>> >
>>> >  someVariable = 1, 2, 3 ;
>>> >
>>> >  time = 1, 2, 3, 4, 5 ;
>>> >
>>> >  someData =
>>> >   1, 2, 3, 4, 5,
>>> >   1, 2, 3, 4, 5,
>>> >   1, 2, 3, 4, 5 ;
>>> > }
>>> > How To Interpret
>>> >
>>> > Starting from the timeSeries variables:
>>> >
>>> > See CF-1.8 conventions.
>>> > See the timeSeries featureType.
>>> > Find the timeseries_id cf_role.
>>> > Find the coordinates attribute of data variables.
>>> > See that the variables indicated by the coordinates attribute have a 
>>> > cf_role geometry_x_nodeand geometry_y_node to determine that these are 
>>> > geometries according to this new specification.
>>> > Find the coordinate index variable with geom_coordinates that point to 
>>> > the nodes.
>>> > Find the variable with contiguous_ragged_dimension pointing to the 
>>> > dimension of the coordinate index variable to determine how to index into 
>>> > the coordinate index.
>>> > Iterate over polygons, parsing out geometries using the contiguous ragged 
>>> > start variable and coordinate index variable to interpret the coordinate 
>>> > data variables.
>>> > Or, without reference to timeSeries:
>>> >
>>> > See CF-1.8 conventions.
>>> > See the geom_type of multipolygon.
>>> > Find the variable with a contiguous_ragged_dimension matching the 
>>> > coordinate index variable?s dimension.
>>> > See the geom_coordinates of x y.
>>> > Using the contiguous ragged start variable found in 3 and the coordinate 
>>> > index variable found in 2, geometries can be parsed out of the coordinate 
>>> > index variable and parsed using the hole and break values in it.
>>> >
>>> > -------------- next part --------------
>>> > An HTML attachment was scrubbed...
>>> > URL: 
>>> > <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html
>>> >  
>>> > <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html>
>>> >  
>>> > <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html
>>> >  
>>> > <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html>>>
>>> >
>>> > ------------------------------
>>> >
>>> > Subject: Digest Footer
>>> >
>>> > _______________________________________________
>>> > CF-metadata mailing list
>>> > [email protected] <mailto:[email protected]> 
>>> > <mailto:[email protected] <mailto:[email protected]>>
>>> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata 
>>> > <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata> 
>>> > <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata 
>>> > <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>>
>>> >
>>> >
>>> > ------------------------------
>>> >
>>> > End of CF-metadata Digest, Vol 166, Issue 3
>>> > *******************************************
>>> >
>>> >
>>> >
>>> > --
>>> > Sincerely,
>>> >
>>> > Bob Simons
>>> > IT Specialist
>>> > Environmental Research Division
>>> > NOAA Southwest Fisheries Science Center
>>> > 99 Pacific St., Suite 255A      (New!)
>>> > Monterey, CA 93940               (New!)
>>> > Phone: (831)333-9878 <tel:%28831%29333-9878>            (New!)
>>> > Fax:   (831)648-8440 <tel:%28831%29648-8440>
>>> > Email: [email protected] <mailto:[email protected]> 
>>> > <mailto:[email protected] <mailto:[email protected]>>
>>> >
>>> > The contents of this message are mine personally and
>>> > do not necessarily reflect any position of the
>>> > Government or the National Oceanic and Atmospheric Administration.
>>> > <>< <>< <>< <>< <>< <>< <>< <>< <><
>>> >
>>> > _______________________________________________
>>> > CF-metadata mailing list
>>> > [email protected] <mailto:[email protected]>
>>> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata 
>>> > <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>>> 
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL: 
>>> <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170203/4ff55def/attachment.html
>>>  
>>> <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170203/4ff55def/attachment.html>>
>>> 
>>> ------------------------------
>>> 
>>> Subject: Digest Footer
>>> 
>>> _______________________________________________
>>> CF-metadata mailing list
>>> [email protected] <mailto:[email protected]>
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata 
>>> <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>>> 
>>> 
>>> ------------------------------
>>> 
>>> End of CF-metadata Digest, Vol 166, Issue 5
>>> *******************************************
>>> 
>>> 
>>> 
>>> -- 
>>> Sincerely,
>>> 
>>> Bob Simons
>>> IT Specialist
>>> Environmental Research Division
>>> NOAA Southwest Fisheries Science Center 
>>> 99 Pacific St., Suite 255A      (New!)
>>> Monterey, CA 93940               (New!) 
>>> Phone: (831)333-9878 <tel:(831)%20333-9878>            (New!)
>>> Fax:   (831)648-8440 <tel:(831)%20648-8440>
>>> Email: [email protected] <mailto:[email protected]>
>>> 
>>> The contents of this message are mine personally and 
>>> do not necessarily reflect any position of the 
>>> Government or the National Oceanic and Atmospheric Administration.
>>> <>< <>< <>< <>< <>< <>< <>< <>< <>< 
>>> 
>>> 
>>> _______________________________________________
>>> CF-metadata mailing list
>>> [email protected] <mailto:[email protected]>
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata 
>>> <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> Christopher Barker, Ph.D.
>>> Oceanographer
>>> 
>>> Emergency Response Division
>>> NOAA/NOS/OR&R            (206) 526-6959   voice
>>> 7600 Sand Point Way NE   (206) 526-6329   fax
>>> Seattle, WA  98115       (206) 526-6317   main reception
>>> 
>>> [email protected] 
>>> <mailto:[email protected]>_______________________________________________
>>> CF-metadata mailing list
>>> [email protected] <mailto:[email protected]>
>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata 
>>> <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>> 
>

_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Re: [CF-metadata] Extension of Discrete Sampling Geometries for Simple Features

Reply via email to