Re: [CF-metadata] Extension of Discrete Sampling Geometries for Simple Features

David Blodgett Fri, 17 Feb 2017 12:09:05 -0800

My apologies, I forgot to turn on time zone support in the poll below. Please 
use this one instead. http://doodle.com/poll/eikarnt35tdm7igd 
<http://doodle.com/poll/eikarnt35tdm7igd>


> On Feb 17, 2017, at 1:22 PM, David Blodgett <[email protected]> wrote:
> 
> All,
> 
> I haven’t heard much follow up, but here’s a doodle to coordinate a phone 
> conversation about this. I think we have west-coast US participants and EU 
> participants, so I chose times mid to late morning for me (midwest US).
> 
> http://doodle.com/poll/eikarnt35tdm7igd 
> <http://doodle.com/poll/eikarnt35tdm7igd>
> 
> Will make a call once a few people have expressed interest and we have a 
> clear day/time.
> 
> Regards,
> 
> - Dave
> 
>> On Feb 6, 2017, at 11:29 AM, David Blodgett <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Dear CF, 
>> 
>> I want to follow up on the conversation here with an alternative approach 
>> suggested off list primarily between Jonathan and I. For this, I’m going to 
>> focus on use cases satisfied and simplification of the proposal allowed by 
>> not supporting those use cases. The changes below are largely driven by a 
>> desire to better align this proposal with the technical details of the prior 
>> art that is CF. 
>> 
>> If we: 
>> 1) don’t support node sharing, we can remove the complication of node - 
>> coordinate indexing / indirection, simplifying the proposal pretty 
>> significantly.
>> 2) don’t use “break values” to indicate the separation between multi-part 
>> geometries and polygon holes, we end up with a data model with an extra 
>> dimension, but the NetCDF dimensions align with the natural dimensions of 
>> the data.
>> 3) use “count” instead of a “start pointer” approach, we are better aligned 
>> with the existing DSG contiguous ragged array approach.
>> 
>> Coming back to the three directions we could take this proposal from my 
>> cover letter on February 2nd.
>>> Direct use of Well-Known Text (WKT). In this approach, well known text 
>>> strings would be encoded using character arrays following a contiguous 
>>> ragged array approach to index the character array by geometry (or instance 
>>> in DSG parlance).
>>> Implement the WKT approach using a NetCDF binary array. In this approach, 
>>> well known text separators (brackets, commas and spaces) for multipoint, 
>>> multiline, multipolygon, and polygon holes, would be encoded as break type 
>>> separator values like -1 for multiparts and -2 for holes.
>>> Implement the fundamental dimensions of geometry data in NetCDF. In this 
>>> approach, additional dimensions and variables along those dimensions would 
>>> be introduced to represent geometries, geometry parts, geometry nodes, and 
>>> unique (potentially shared) coordinate locations for nodes to reference.
>> The alternative I’m outlining here moves in the direction of 3. We had 
>> originally discounted it because it becomes very verbose and seems overly 
>> complicated if support for coordinate sharing is a requirement. If the three 
>> simplifications described above are used, then the third approach seems more 
>> tenable. 
>> 
>> Jonathan has also suggested that: (these are in reaction to the CDL in my 
>> letter from February 2nd)
>> 1) Rename geom_coordinates as node_coordinates, for consistency with UGRID.
>> 2) Omit node_dimension. This is redundant, since the dimension can be found 
>> by
>> examining the node coordinate variables.
>> 3) Prescribe numerous “codes” and assumptions in the specification instead 
>> of letting them be described with attribute values.
>> 4) It would be more consistent with CF and UGRID to use a single container 
>> variable to hang all the topology/geometry information from.
>> 
>> Which I, personally, am happy to accept if others don’t object.
>> 
>> A couple other suggestions from Jonathan I want to discuss a bit more:
>> 1) Rename geometry as topology and geom_type as topology_type.
>>      While I’d be open to something other than geom, topology is odd. If 
>> this is really “node_collection_topology_type” I guess I could be convinced, 
>> but would be curious how people react to this. (Especially in relation to 
>> UGRID)
>> 2) This extension is more appropriate as an extension to the concept of cell 
>> bounds than the addition of a complex time-invariate type of discrete 
>> sampling geometry. 
>>      Having just re-read the cell bounds chapter, I think it would over 
>> complicate the cell bounds to include this material. My basic issue here is 
>> that these geometries do not necessarily have a reference location. They 
>> are, rather, first order entities that need to be treated as such. That 
>> said, it makes sense that these geometries are not necessarily a good fit 
>> for the original intent of Discrete Sampling Geometries. Jonathan suggested 
>> they may belong in their own chapter, which may be a good alternative? MY 
>> suggested CDL below might lead us in the direction of this being a special 
>> type of auxiliary coordinate variable. 
>> 
>> This alternative starts to look like the CDL pasted below.
>> 
>> Note that the issue of coordinates is sticking out like a sore thumb. Below, 
>> I’ve attempted to reconcile Jonathan’s ideas regarding coordinates with my 
>> thoughts about how these geometries are “first order entities” that don’t 
>> have a single representative x and y. The spatial coordinates can be said to 
>> reside in the system of geometries described in the “sf” container variable? 
>> I realize this goes against the idea of coordinates a bit, but I think it is 
>> holding with the spirit of the attribute?
>> 
>> Finally, I’m glad to continue answering questions and debating things via 
>> the list to a point, but I think it would be in our interest to arrange a 
>> telecom to discuss this stuff further with a list of interested parties. 
>> Feel free to follow up on list, but for decision making, let’s not let this 
>> rabbit hole go too deep. I’ll plan on letting this and the other recent 
>> action on this proposal settle with people for a week or two then start to 
>> bring together a conference call (or calls depending on time zones). Please 
>> respond to me off list if you are interested in being part of a call to 
>> discuss.
>> 
>> Regards,
>> 
>> - Dave 
>> 
>> netcdf multipolygon_example {
>> dimensions:
>>  node = 47 ;
>>  part = 9 ;
>>  instance = 3 ;
>>  time = 5 ;
>>  strlen = 5 ;
>> variables:
>>  char instance_name(instance, strlen) ;
>>    instance_name:cf_role = "timeseries_id" ;
>>  double someVariable(instance) ;
>>    someVariable:long_name = "a variable describing a single-valued attribute 
>> of a polygon" ;
>>    someVariable:coordinates = "sf" ; // or "instance_name"?
>>  int time(time) ;
>>    time:units = "days since 2000-01-01" ;
>>  double someData(instance, time) ;
>>    someData:coordinates = "time sf" ; // or "time instance_name"?
>>    someData:featureType = "timeSeries" ;
>>    someData:geometry="sf";
>>  int sf; // containing variable -- datatype irrelevant because no data
>>    sf:geom_type = "multipolygon" ; // could be node_topology_type?
>>    sf:node_count_variable="node_count";
>>    sf:node_coordinates = "x y" ;
>>    sf:part_count = "part_node_count" ;
>>    sf:part_type = "part_type" ; // Note required unless polygons with holes 
>> present.
>>    sf:outer_ring_order = "anticlockwise" ; // not required if written in 
>> spec?
>>    sf:closure_convention = "last_node_equals_first" ; // not required if 
>> written in spec?
>>    sf:outer_type_code = 0 ; // not required if written in spec?
>>    sf:inner_type_code = 1 ; // not required if written in spec?
>>  int node_count(instance);
>>    node_count:long_name = “count of coordinates in each instance geometry" ;
>>  int part_node_count(part) ;
>>    part_node_count:long_name = “count of coordinates in each geometry part" ;
>>  int part_type(part) ;
>>    part_type:long_name = “type of each geometry part" ;
>>  double x(node) ;
>>    x:units = "degrees_east" ;
>>    x:standard_name = "longitude" ; // or projection_x_coordinate
>>    X:cf_role = "geometry_x_node" ;
>>  double y(node) ;
>>    y:units = "degrees_north" ;
>>    y:standard_name = “latitude” ; // or projection_y_coordinate
>>    y:cf_role = "geometry_y_node"
>> // global attributes:
>>     :Conventions = "CF-1.8" ;
>> 
>> data:
>> 
>>  instance_name =
>>   "flash",
>>   "bang",
>>   "pow" ;
>> 
>>  someVariable = 1, 2, 3 ;
>> 
>>  time = 1, 2, 3, 4, 5 ;
>> 
>>  someData =
>>   1, 2, 3, 4, 5,
>>   1, 2, 3, 4, 5,
>>   1, 2, 3, 4, 5 ;
>> 
>>  node_count = 25, 15, 7 ;
>> 
>>  part_node_count = 5, 4, 4, 4, 4, 8, 6, 8, 4 ;
>> 
>>  part_type = 0, 1, 1, 1, 0, 0, 0, 1, 0 ;
>> 
>>  x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
>>     5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45, -20, 
>> -30, -20, -20, -30, 30, 
>>     45, 10, 30, 25, 50, 30, 25 ;
>> 
>>  y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 
>> 29, 
>>     25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20, -35, 
>> -20, -15, -25, -20, 20,
>>     40, 40, 20, 5, 10, 15, 5 ;
>> }
>> 
>> 
>> 
>>> On Feb 4, 2017, at 8:07 AM, David Blodgett <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Dear Chris, 
>>> 
>>> Thanks for your thorough treatment of these issues. We have gone through a 
>>> similar thought process to arrive at the proposal we came up with. I’ll 
>>> answer as briefly as I can.
>>> 
>>> 1) how would you translate between netcdf geometries and, say geo JSON?
>>> 
>>> The thinking is that node coordinate sharing is optional. If the writer 
>>> wants to check or already knows that nodes share coordinates, then it’s 
>>> possible. Otherwise, it doesn’t have to be used. I’ve always felt that this 
>>> was important, but maybe not critical for a core NetCDF-CF data model. Some 
>>> offline conversation has led to an example that does not use it that may be 
>>> a good alternative, more on that later.
>>> 
>>> 2) Break Values
>>> 
>>> You really do have to hold your nose on the break values. The issue is that 
>>> you have to store that information somehow and it is almost worse to create 
>>> new variables to store the multi-part and hole/not hole information. The 
>>> alternative approach that’s forming up as mentioned above does break the 
>>> information out into additional variables but simplifies things otherwise. 
>>> In that case it doesn’t feel overly complex to me… so stay tuned for more 
>>> on this front.
>>> 
>>> 3) Ragged Indexing
>>> 
>>> Your thought process follows ours exactly. The key is that you either have 
>>> to create the “pointer” array as a first order of business or loop over the 
>>> counts ad nauseam. I’m actually leaning toward the counts for two reasons. 
>>> First, the counts approach is already in CF so is a natural fit and will be 
>>> familiar to developers in this space. Second, the issue of 0 vs 1 indexing 
>>> is annoying. In our proposal, we settled on 0 indexing because it aligns 
>>> with the idea of an offset, but it is still annoying and some applications 
>>> would always have to adjust that pointer array as a first order of 
>>> business. 
>>> 
>>> On to Bob’s comments.
>>> 
>>> Regarding aligning with other data models / encodings, I guess this needs 
>>> to be unpacked a bit. 
>>> 
>>> 1) In this setting, simple features is a data model, not an encoding. An 
>>> encoding can implement part or all of a data model as is needed by the use 
>>> case(s) at hand. There is no problem with partial implementations you still 
>>> get interoperability for the intended use cases.
>>> 2) Attempting to align with other encoding standards UGRID and NetCDF-CF 
>>> are the primary ones here, is simply to keep the implementation patterns 
>>> similar and familiar. This may be a fools errand, but is presumably good 
>>> for adoptability and consistency. 
>>> So, I don’t see a problem with implementing important simple features types 
>>> in a way that aligns with the way the existing community standards work.
>>> 
>>> I don’t see this as ignoring existing standards at all. There is no open 
>>> community standard for binary encoding of geometries and related data that 
>>> passes the CF requirements of human readability and self-description. We 
>>> are adopting the appropriate data model and suggesting a new encoding that 
>>> will solve a lot of problems in the environmental modeling space. 
>>> 
>>> As we’ve discussed before, your "different approach” sounds great, but 
>>> seems like an exercise for a future effort that doesn’t attempt to align 
>>> with CF 1.7. Maybe what you suggest is a path forward for variable length 
>>> arrays in the CF 2.0 “vision in the mist”, but I don’t see it as a tenable 
>>> solution for CF 1.*.
>>> 
>>> Best Regards,
>>> 
>>> - Dave
>>> 
>>> 
>>>> On Feb 3, 2017, at 3:31 PM, Chris Barker <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> a few thoughts. First, I think there are three core "issues" that need to 
>>>> be resolved:
>>>> 
>>>> 1) Coordinate indexing (indirection)
>>>> 
>>>> the question of whether you have an array of "vertices" that the geomotry 
>>>> types index into to get thier data:
>>>> 
>>>> Advantages:
>>>>  - if a number of geometries share a lot of vertices, it can be more 
>>>> efficient
>>>>  - the relationship between geometries that share vertices (i.e. polygons 
>>>> that share a boundary) etc. is well defined. you dopnt need to check for 
>>>> closeness, and maybe have a tolerance, etc.
>>>> 
>>>> These were absolutely critical for UGRID for example -- a UGRID mesh is a 
>>>> single thing", NOT a collection of polygons that happen to share some 
>>>> vertices.
>>>> 
>>>> Disadvantages:
>>>>  -  if the geometries do not share many vertices, it is less efficient.
>>>>  -  there are additional code complications in "getting" the vertices of 
>>>> the given geometry
>>>>  - it does not match the OGC data model.
>>>> 
>>>> My 0.02 -- given my use cases, I tend to want teh advantages -- but I 
>>>> don't know that that's a typical use case. And I think it's a really good 
>>>> idea to keep with the OGS data model where possible -- i.e. e able to 
>>>> translate from netcdf to, say, geoJSON as losslessly as possible. Given 
>>>> that I think it's probably a better idea not to have the indirection.
>>>> 
>>>> However (to equivocate) perhaps the types of information people are likely 
>>>> to want to store in netcdf are a subset of what the OGC standards are 
>>>> designed for -- and for those use-cases, maybe shared vertices are 
>>>> critical.
>>>> 
>>>> One way to think about it -- how would you translate between netcdf 
>>>> geometries and, say geo JSON:
>>>>   - nc => geojson would lose the shared index info.
>>>>   - geojson => nc -- would you try to reconstruct the shared vertices?? 
>>>> I"m thinking that would be a bit dangerous in the general case, because 
>>>> you are adding information that you don't know is true -- are these a 
>>>> shared vertex or two that just happen to be at the same location?
>>>> 
>>>> > > Break values
>>>> 
>>>> I don't really like break values as an approach, but with netcdf any 
>>>> option will be ugly one way or another. So keeping with the WKT approach 
>>>> makes sense to me. Either way you'll need custom code to unpack it. (BTW 
>>>> -- what does WellKnownBinary do?)
>>>> 
>>>> > > Ragged indexing
>>>> 
>>>> There are two "natural" ways to represent a ragged array:
>>>> 
>>>> (a) store the length of each "row"
>>>> (b) store the index to the beginning (or end) or each "row"
>>>> 
>>>> CF already uses (a). However, working with it, I'm pretty convinced that 
>>>> it's the "wrong" choice:
>>>> 
>>>> If you want to know how long a given row is, that is really easy with (a), 
>>>> and almost as easy with (b) (involves two indexes and a subtraction)
>>>> 
>>>> However, if you want to extract a particular row: (b) makes this really 
>>>> easy -- you simply access the slice of the array you want. with (a) you 
>>>> need to loop through the entire "length_of_rows" array (up to the row of 
>>>> interest) and add up the values to find the slice you need. not a huge 
>>>> issue, but it is an issue. In fact, in my code to read ragged arrays in 
>>>> netcdf, the first thing I do is pre-compute the index-to-each-row, so I 
>>>> can then use that to access individual rows for future access -- if  you 
>>>> are accessing via OpenDAP -- that's particular helpful.
>>>> 
>>>> So -- (b) is clearly (to me) the "best" way to do it -- but is it worth 
>>>> introducing a second way to handle ragged arrays in CF? I would think yes, 
>>>> but that would be offset if:
>>>> 
>>>>  - There is a bunch of existing library code that transparently handles 
>>>> ragged arrays in netcdf (does netcdfJava have something? I'm pretty sure 
>>>> Python doesn't -- certainly not in netCDF4) 
>>>> 
>>>>  - That that existing lib code would be advantageous to leverage for code 
>>>> reading features: I suspect that there will have to be enough custom code 
>>>> that the ragged array bits are going to be the least of it.
>>>> 
>>>> So I'm for the "new" way of representing ragged arrays
>>>> 
>>>> -CHB
>>>> 
>>>> 
>>>> On Fri, Feb 3, 2017 at 11:41 AM, Bob Simons - NOAA Federal 
>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>> Then, isn't this proposal just the first step in the creation of a new 
>>>> model and a new encoding of Simple Features, one that is "align[ed] ... 
>>>> with as many other encoding standards in this space as is practical"? In 
>>>> other words, yet another standard for Simple Features?
>>>> 
>>>> If so, it seems risky to me to take just the first (easy?) step "to 
>>>> support the use cases that have a compelling need today" and not solve the 
>>>> entire problem. I know the CF way is to just solve real, current needs, 
>>>> but in this case it seems to risk a head slap moment in the future when we 
>>>> realize that, in order to deal with some new simple feature variant, we 
>>>> should have done things differently from the beginning?
>>>> 
>>>> And it seems odd to reject existing standards that have been so 
>>>> painstakingly hammered out, in favor of starting the process all over 
>>>> again.  We follow existing standards for other things (e.g., IEEE-754 for 
>>>> representing floating point numbers in binary files), why can't we follow 
>>>> an existing Simple Features standard?
>>>> 
>>>> ---
>>>> Rather than just be a naysayer, let me suggest a very different 
>>>> alternative:
>>>> 
>>>> There are several projects in the CF realm (e.g., this Simple Features 
>>>> project, Discrete Sampling Geometry (DSG), true variable-length Strings, 
>>>> ugrid(?)) which share a common underlying problem: how to deal with 
>>>> variable-length multidimensional arrays: a[b][c], where the length of the 
>>>> c dimension may be different for different b indices.
>>>> DSG solved this (5 different ways!), but only for DSG.
>>>> The Simple Features proposal seeks to solve the problem for Simple 
>>>> Features.
>>>> We still have no support for Unicode variable-length Strings.
>>>> 
>>>> Instead of continuing to solve the variable-length problem a different way 
>>>> every time we confront it, shouldn't we solve it once, with one small 
>>>> addition to the standard, and then use that solution repeatedly?
>>>> The solution could be a simple variant of one of the DSG solutions, but 
>>>> generalized so that it could be used in different situations.
>>>> An encoding standard and built-in support for variable-length data arrays 
>>>> in netcdf-java/c would solve a lot of problems, now and in the future.
>>>> Some work on this is already done: I think the netcdf-java API already 
>>>> supports variable-length arrays when reading netcdf-4 files.
>>>> For Simple Features, the problem would reduce to: store the feature (using 
>>>> some specified existing standard like WKT or WKB) in a variable-length 
>>>> array. 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, Feb 3, 2017 at 9:07 AM, <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> Date: Fri, 3 Feb 2017 11:07:00 -0600
>>>> From: David Blodgett <[email protected] <mailto:[email protected]>>
>>>> To: Bob Simons - NOAA Federal <[email protected] 
>>>> <mailto:[email protected]>>
>>>> Cc: CF Metadata <[email protected] 
>>>> <mailto:[email protected]>>
>>>> Subject: Re: [CF-metadata] Extension of Discrete Sampling Geometries
>>>>         for Simple Features
>>>> Message-ID: <[email protected] 
>>>> <mailto:[email protected]>>
>>>> Content-Type: text/plain; charset="utf-8"
>>>> 
>>>> Dear Bob,
>>>> 
>>>> I?ll just take these in line.
>>>> 
>>>> 1) noted. We have been trying to figure out what to do with the point 
>>>> featureType and I think leaving it more or less alone is a viable path 
>>>> forward.
>>>> 
>>>> 2) This is not an exact replica of WKT, but rather a similar approach to 
>>>> WKT. As I stated, we have followed the ISO simple features data model and 
>>>> well known text feature types in concept, but have not used the same 
>>>> standardization formalisms. We aren?t advocating for supporting ?all of? 
>>>> any standard but are rather attempting to support the use cases that have 
>>>> a compelling need today while aligning this with as many other encoding 
>>>> standards in this space as is practical. Hopefully that answers your 
>>>> question, sorry if it?s vague.
>>>> 
>>>> 3) The google doc linked in my response contains the encoding we are 
>>>> proposing as a starting point for conversation: http://goo.gl/Kq9ASq 
>>>> <http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>> I 
>>>> want to stress, as a starting point for discussion. I expect that this 
>>>> proposal will change drastically before we?re done.
>>>> 
>>>> 4) Absolutely envision tools doing what you say, convert to/from standard 
>>>> spatial formats and NetCDF-CF geometries. We intend to introduce an R and 
>>>> a Python implementation that does exactly as you say along with whatever 
>>>> form this standard takes in the end. R and Python were chosen as the team 
>>>> that brought this together are familiar with those two languages, 
>>>> additional implementations would be more than welcome.
>>>> 
>>>> 5) We do include a ?geometry? featureType similar to the ?point? 
>>>> featureType. Thus our difficulty with what to do with the ?point? 
>>>> featureType. You are correct, there are lots of non timeSeries 
>>>> applications to be solved and this proposal does intend to support them 
>>>> (within the existing DSG constructs).
>>>> 
>>>> Thanks for your questions, hopefully my answers close some gaps for you.
>>>> 
>>>> - Dave
>>>> 
>>>> > On Feb 3, 2017, at 10:47 AM, Bob Simons - NOAA Federal 
>>>> > <[email protected] <mailto:[email protected]>> wrote:
>>>> >
>>>> > 1) There is a vague comment in the proposal about possibly changing the 
>>>> > point featureType. Please don't, unless the changes don't affect current 
>>>> > uses of Point. There are already 1000's of files that use it. If this 
>>>> > new system offers an alternative, then fine, it's an alternative. One of 
>>>> > the most important and useful features of a good standard is backwards 
>>>> > compatibility.
>>>> >
>>>> > 2) You advocate "Implement the WKT approach using a NetCDF binary 
>>>> > array." Is this system then an exact encoding of WKT, neither a subset 
>>>> > nor a superset?  "Simple Features" are often not simple.
>>>> > If it is WKT (or something else), what is the standard you are following 
>>>> > to describe the Simple Features (e.g.,  ISO/IEC 13249-3:2016 and ISO 
>>>> > 19162:2015)?
>>>> > Does your proposal deviate in any way from the standard's capabilities?
>>>> > Do you advocate following the entire WKT standard, e.g., supporting all 
>>>> > the feature types that WKT supports?
>>>> >
>>>> > 3) Since you are not using the WKT encoding, but creating your own, 
>>>> > where is the definition of the encoding system you are using?
>>>> >
>>>> > 4) This is a little out of CF scope, but:
>>>> > Do you envision tools, notably, netcdf-c/java, having a writer function 
>>>> > that takes in WKT and encodes the information in a file, and having a 
>>>> > reader function that reads the file and returns WKT? Or is it your plan 
>>>> > that the encoding/ decoding is left to the user?
>>>> >
>>>> > 5) This proposal is for "Simple Features plus Time Series" (my phrase 
>>>> > not yours). But aren't there lots of other uses of Simple Features? Will 
>>>> > there be other proposals in the future for "Simple Features plus X" and 
>>>> > "Simple Features plus Y"? If so, will CF eventually become a massive 
>>>> > document where Simple Features are defined over and over again, but in 
>>>> > different contexts? If so, wouldn't a better solution be to deal with 
>>>> > Simple Features separately (as Postgres does by making a geometric data 
>>>> > type?), and then add "Simple Features plus Time Series" as the first use 
>>>> > of it?
>>>> >
>>>> > Thanks for answering these questions.
>>>> > Please forgive me if I missed parts of your proposal that answer these 
>>>> > questions.
>>>> >
>>>> >
>>>> > On Thu, Feb 2, 2017 at 5:57 AM, <[email protected] 
>>>> > <mailto:[email protected]> 
>>>> > <mailto:[email protected] 
>>>> > <mailto:[email protected]>>> wrote:
>>>> > Date: Thu, 2 Feb 2017 07:57:36 -0600
>>>> > From: David Blodgett <[email protected] <mailto:[email protected]> 
>>>> > <mailto:[email protected] <mailto:[email protected]>>>
>>>> > To: <[email protected] <mailto:[email protected]> 
>>>> > <mailto:[email protected] <mailto:[email protected]>>>
>>>> > Subject: [CF-metadata] Extension of Discrete Sampling Geometries for
>>>> >         Simple  Features
>>>> > Message-ID: <[email protected] 
>>>> > <mailto:[email protected]> 
>>>> > <mailto:[email protected] 
>>>> > <mailto:[email protected]>>>
>>>> > Content-Type: text/plain; charset="utf-8"
>>>> >
>>>> > Dear CF Community,
>>>> >
>>>> > We are pleased to submit this proposal for your consideration and 
>>>> > review. The cover letter we've prepared below provides some background 
>>>> > and explanation for the proposed approach. The google doc here 
>>>> > <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq 
>>>> > <http://goo.gl/Kq9ASq>>> is an excerpt of the CF specification with 
>>>> > track changes turned on. Permissions for the document allow any google 
>>>> > user to comment, so feel free to comment and ask questions in line.
>>>> >
>>>> > Note that I?m sharing this with you with one issue unresolved. What to 
>>>> > do with the point featureType? Our draft suggests that it is part of a 
>>>> > new geometry featureType, but it could be that we leave it alone and 
>>>> > introduce a geometry featureType. This may be a minor point of 
>>>> > discussion, but we need to be clear that this is an issue that still 
>>>> > needs to be resolved in the proposal.
>>>> >
>>>> > Thank you for your time and consideration.
>>>> >
>>>> > Best Regards,
>>>> >
>>>> > David Blodgett, Tim Whiteaker, and Ben Koziol
>>>> >
>>>> > Proposed Extension to NetCDF-CF for Simple Geometries
>>>> >
>>>> > Preface
>>>> >
>>>> > The proposed addition to NetCDF-CF introduced below is inspired by a 
>>>> > pre-existing data model governed by OGC and ISO as ISO 19125-1. More 
>>>> > information on Simple Features may be found here. 
>>>> > <https://en.wikipedia.org/wiki/Simple_Features 
>>>> > <https://en.wikipedia.org/wiki/Simple_Features> 
>>>> > <https://en.wikipedia.org/wiki/Simple_Features 
>>>> > <https://en.wikipedia.org/wiki/Simple_Features>>> To the knowledge of 
>>>> > the authors, it is consistent with ISO 19125-1 but has not been 
>>>> > specified using the formalisms of OGC or ISO. Language used attempts to 
>>>> > hold true to NetCDF-CF semantics while not conflicting with the existing 
>>>> > standards baseline. While this proposal does not support the entire 
>>>> > scope of the the simple features ecosystem, it does support the core 
>>>> > data types in most common use around the community.
>>>> >
>>>> > The other existing standard to mention is UGRID convention 
>>>> > <http://ugrid-conventions.github.io/ugrid-conventions/ 
>>>> > <http://ugrid-conventions.github.io/ugrid-conventions/> 
>>>> > <http://ugrid-conventions.github.io/ugrid-conventions/ 
>>>> > <http://ugrid-conventions.github.io/ugrid-conventions/>>>. The authors 
>>>> > have experience reading and writing UGRID and have designed the proposed 
>>>> > structure in a way that is inspired by and consistent with it.
>>>> >
>>>> > Terms and Definitions
>>>> >
>>>> > (Taken from OGC 06-103r4 OpenGIS Implementation Specification for 
>>>> > Geographic information - Simple feature access - Part 1: Common 
>>>> > architecture <http://www.opengeospatial.org/standards/sfa 
>>>> > <http://www.opengeospatial.org/standards/sfa> 
>>>> > <http://www.opengeospatial.org/standards/sfa 
>>>> > <http://www.opengeospatial.org/standards/sfa>>>.)
>>>> >
>>>> > Feature: Abstraction of real world phenomena - typically a geospatial 
>>>> > abstraction with associated descriptive attributes.
>>>> > Simple Feature: A feature with all geometric attributes described 
>>>> > piecewise by straight line or planar interpolation between point sets.
>>>> > Geometry (geometric complex): A set of disjoint geometric primitives - 
>>>> > one or more points, lines, or polygons that form the spatial 
>>>> > representation of a feature.
>>>> > Introduction
>>>> >
>>>> > Discrete Sampling Geometries (DSGs) handle data from one (or a 
>>>> > collection of) timeSeries (point), Trajectory, Profile, 
>>>> > TrajectoryProfile or timeSeriesProfile geometries. Measurements are from 
>>>> > a point (timeSeries and Profile) or points along a trajectory. In this 
>>>> > proposal, we reuse the core DSG timeSeries type which provides support 
>>>> > for basic time series use cases e.g., a timeSerieswhich is measured (or 
>>>> > modeled) at a given point.
>>>> >
>>>> > Changes to Existing CF Specification
>>>> >
>>>> > In NetCDF-CF 1.7, Discrete Sampling Geometries separate dimensions and 
>>>> > variables into two types ? instance and element 
>>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements
>>>> >  
>>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements>
>>>> >  
>>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements
>>>> >  
>>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_collections_instances_and_elements>>>.
>>>> >  Instance refers to individual points, trajectories, profiles, etc. 
>>>> > These would sometimes be referred to as features given that they are 
>>>> > identified entities that can have associated attributes and be related 
>>>> > to other entities. Element dimensions describe temporal or other 
>>>> > dimensions to describe data on a per-instance basis. This proposal 
>>>> > extends the DSG timeSeries featuretype 
>>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types
>>>> >  
>>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types>
>>>> >  
>>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types
>>>> >  
>>>> > <http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types>>>
>>>> >  such that the geospatial coordinates of the instances can be point, 
>>>> > multi-point, line, multi-line, polygon, or multi-polyg
>>>>  on geometries. Rather than overload the DSG contiguous ragged array 
>>>> encoding, designed with timeseries in mind, a geometry ragged array 
>>>> encoding is introduced in a new section 9.3.5. See thi
>>>> >  s google doc for specific proposed changes. <http://goo.gl/Kq9ASq 
>>>> > <http://goo.gl/Kq9ASq> <http://goo.gl/Kq9ASq <http://goo.gl/Kq9ASq>>>
>>>> > Motivation
>>>> >
>>>> > DSGs have no system to define a geometry (polyline, polygon, etc., other 
>>>> > than point) and an association with a time series that applies over that 
>>>> > entire geometry e.g., The expected rainfall in this watershed polygon 
>>>> > for some period of time is 10 mm. As suggested in the last paragraph of 
>>>> > section 9.1, current practice is to assign a representative point or 
>>>> > just use an ID and forgo spatial information within a NetCDF-CF file. In 
>>>> > order to satisfy a number of environmental modeling use cases, we need a 
>>>> > way to encode a geometry (point, line, polygon, multi-point, multi-line, 
>>>> > or multi-polygon) that is the static spatial feature representation to 
>>>> > which one or more timeSeries can be associated. In this proposal, we 
>>>> > provide an encoding to define collections of simple feature geometries. 
>>>> > It interfaces cleanly with the existing DSG specification, enabling DSGs 
>>>> > and Simple Geometries to be used concurrently.
>>>> >
>>>> > Looking Forward
>>>> >
>>>> > This proposal is a compromise solution that attempts to stay consisten 
>>>> > to CF ideals and fit within the structure of the existing specification 
>>>> > with minimal disruption. Line and polygon data types often require 
>>>> > variable length arrays. Development of this proposal has brought to 
>>>> > light the need for a general abstraction for variable length arrays in 
>>>> > NetCDF-CF. Such a general abstraction would necessarily be reusable for 
>>>> > character arrays, ragged arrays of time series, and ragged arrays of 
>>>> > geometry nodes, as well as any other ragged data structures that may 
>>>> > come up in the future. This proposal does not introduce such a general 
>>>> > ragged array abstraction but does not preclude such a development in the 
>>>> > future.
>>>> >
>>>> > Three Alternative Approaches
>>>> >
>>>> > Respecting the human readability ideal of NetCDF-CF, the development of 
>>>> > this proposal started from a human readable format for geometries known 
>>>> > as Well Known Text <https://en.wikipedia.org/wiki/Well-known_text 
>>>> > <https://en.wikipedia.org/wiki/Well-known_text> 
>>>> > <https://en.wikipedia.org/wiki/Well-known_text 
>>>> > <https://en.wikipedia.org/wiki/Well-known_text>>>. We considered three 
>>>> > high level design approaches while developing this proposal.
>>>> >
>>>> > Direct use of Well-Known Text (WKT). In this approach, well known text 
>>>> > strings would be encoded using character arrays following a contiguous 
>>>> > ragged array approach to index the character array by geometry (or 
>>>> > instance in DSG parlance).
>>>> > Implement the WKT approach using a NetCDF binary array. In this 
>>>> > approach, well known text separators (brackets, commas and spaces) for 
>>>> > multipoint, multiline, multipolygon, and polygon holes, would be encoded 
>>>> > as break type separator values like -1 for multiparts and -2 for holes.
>>>> > Implement the fundamental dimensions of geometry data in NetCDF. In this 
>>>> > approach, additional dimensions and variables along those dimensions 
>>>> > would be introduced to represent geometries, geometry parts, geometry 
>>>> > nodes, and unique (potentially shared) coordinate locations for nodes to 
>>>> > reference.
>>>> > Selected Approach
>>>> >
>>>> > The first approach was seen as too opaque to stay true to the CF ideal 
>>>> > of complete self-description. The third approach seemed needlessly 
>>>> > verbose and difficult to implement. The second approach was selected for 
>>>> > the following reasons:
>>>> >
>>>> > The second approach is just as or more human-readable than the third.
>>>> > Use of break values keeps geometries relatively atomic.
>>>> > Will be familiar to developers who are familiar with the WKT geometry 
>>>> > format.
>>>> > Character arrays, which are needed for options one and three, are 
>>>> > cumbersome to use in some programming languages in common use with 
>>>> > NetCDF.
>>>> > Break values replace the need for extraneous variables related to 
>>>> > multi-part and polygon holes (interiors). Multi-part geometries are 
>>>> > generally an exception and excessive instrumentation to support them 
>>>> > should be discounted.
>>>> > Example: Representation of WKT-Style Polygons in a NetCDF-3 
>>>> > timeSeriesfeatureType
>>>> >
>>>> > Below is sample CDL demonstrating how polygons are encoded in NetCDF-3 
>>>> > using a continuous ragged array-like encoding. There are three details 
>>>> > to note in the example below.
>>>> >
>>>> > The attribute contiguous_ragged_dimension with value of a dimension in 
>>>> > the file.
>>>> > The geom_coordinates attribute with a value containing a space separated 
>>>> > string of variable names.
>>>> > The cf_role geometry_x_node and geometry_y_node.
>>>> > These three attributes form a system to fully describe collections of 
>>>> > multi-polygon feature geometries. Any variable that has the 
>>>> > continuous_ragged_dimension attribute contains integers that indicate 
>>>> > the 0-indexed starting position of each geometry along the instance 
>>>> > dimension. Any variable that uses the dimension referenced in the 
>>>> > continuous_ragged_dimension attribute can be interpreted using the 
>>>> > values in the variable containing the contiguous_ragged_dimension 
>>>> > attribute. The variables referenced in the geom_coordinates attribute 
>>>> > describe spatial coordinates of geometries. These variables can also be 
>>>> > identified by the cf_roles geometry_x_node and geometry_y_node. Note 
>>>> > that the example below also includes a mechanism to handle multi-polygon 
>>>> > features that also contain holes.
>>>> >
>>>> > netcdf multipolygon_example {
>>>> > dimensions:
>>>> >   node = 47 ;
>>>> >   indices = 55 ;
>>>> >   instance = 3 ;
>>>> >   time = 5 ;
>>>> >   strlen = 5 ;
>>>> > variables:
>>>> >   char instance_name(instance, strlen) ;
>>>> >     instance_name:cf_role = "timeseries_id" ;
>>>> >   int coordinate_index(indices) ;
>>>> >     coordinate_index:geom_type = "multipolygon" ;
>>>> >     coordinate_index:geom_coordinates = "x y" ;
>>>> >     coordinate_index:multipart_break_value = -1 ;
>>>> >     coordinate_index:hole_break_value = -2 ;
>>>> >     coordinate_index:outer_ring_order = "anticlockwise" ;
>>>> >     coordinate_index:closure_convention = "last_node_equals_first" ;
>>>> >   int coordinate_index_start(instance) ;
>>>> >     coordinate_index_start:long_name = "index of first coordinate in 
>>>> > each instance geometry" ;
>>>> >     coordinate_index_start:contiguous_ragged_dimension = "indices" ;
>>>> >   double x(node) ;
>>>> >     x:units = "degrees_east" ;
>>>> >     x:standard_name = "longitude" ; // or projection_x_coordinate
>>>> >     X:cf_role = "geometry_x_node" ;
>>>> >   double y(node) ;
>>>> >     y:units = "degrees_north" ;
>>>> >     y:standard_name = ?latitude? ; // or projection_y_coordinate
>>>> >     y:cf_role = "geometry_y_node"
>>>> >   double someVariable(instance) ;
>>>> >     someVariable:long_name = "a variable describing a single-valued 
>>>> > attribute of a polygon" ;
>>>> >   int time(time) ;
>>>> >     time:units = "days since 2000-01-01" ;
>>>> >   double someData(instance, time) ;
>>>> >     someData:coordinates = "time x y" ;
>>>> >     someData:featureType = "timeSeries" ;
>>>> > // global attributes:
>>>> >     :Conventions = "CF-1.8" ;
>>>> >
>>>> > data:
>>>> >
>>>> >  instance_name =
>>>> >   "flash",
>>>> >   "bang",
>>>> >   "pow" ;
>>>> >
>>>> >  coordinate_index = 0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12, 
>>>> > -2, 13, 14, 15, 16,
>>>> >     -1, 17, 18, 19, 20, -1, 21, 22, 23, 24, 25, 26, 27, 28, -1, 29, 30, 
>>>> > 31, 32, 33,
>>>> >     34, -2, 35, 36, 37, 38, 39, 40, 41, 42, -1, 43, 44, 45, 46 ;
>>>> >
>>>> >  coordinate_index_start = 0, 30, 46 ;
>>>> >
>>>> >  x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
>>>> >     5, 11, 15, 13, 11, -40, -20, -45, -40, -20, -10, -10, -30, -45, -20, 
>>>> > -30, -20, -20, -30, 30,
>>>> >     45, 10, 30, 25, 50, 30, 25 ;
>>>> >
>>>> >  y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 
>>>> > 25, 29,
>>>> >     25, 25, 25, 29, 25, -40, -45, -30, -40, -35, -30, -10, -5, -20, -35, 
>>>> > -20, -15, -25, -20, 20,
>>>> >     40, 40, 20, 5, 10, 15, 5 ;
>>>> >
>>>> >  someVariable = 1, 2, 3 ;
>>>> >
>>>> >  time = 1, 2, 3, 4, 5 ;
>>>> >
>>>> >  someData =
>>>> >   1, 2, 3, 4, 5,
>>>> >   1, 2, 3, 4, 5,
>>>> >   1, 2, 3, 4, 5 ;
>>>> > }
>>>> > How To Interpret
>>>> >
>>>> > Starting from the timeSeries variables:
>>>> >
>>>> > See CF-1.8 conventions.
>>>> > See the timeSeries featureType.
>>>> > Find the timeseries_id cf_role.
>>>> > Find the coordinates attribute of data variables.
>>>> > See that the variables indicated by the coordinates attribute have a 
>>>> > cf_role geometry_x_nodeand geometry_y_node to determine that these are 
>>>> > geometries according to this new specification.
>>>> > Find the coordinate index variable with geom_coordinates that point to 
>>>> > the nodes.
>>>> > Find the variable with contiguous_ragged_dimension pointing to the 
>>>> > dimension of the coordinate index variable to determine how to index 
>>>> > into the coordinate index.
>>>> > Iterate over polygons, parsing out geometries using the contiguous 
>>>> > ragged start variable and coordinate index variable to interpret the 
>>>> > coordinate data variables.
>>>> > Or, without reference to timeSeries:
>>>> >
>>>> > See CF-1.8 conventions.
>>>> > See the geom_type of multipolygon.
>>>> > Find the variable with a contiguous_ragged_dimension matching the 
>>>> > coordinate index variable?s dimension.
>>>> > See the geom_coordinates of x y.
>>>> > Using the contiguous ragged start variable found in 3 and the coordinate 
>>>> > index variable found in 2, geometries can be parsed out of the 
>>>> > coordinate index variable and parsed using the hole and break values in 
>>>> > it.
>>>> >
>>>> > -------------- next part --------------
>>>> > An HTML attachment was scrubbed...
>>>> > URL: 
>>>> > <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html
>>>> >  
>>>> > <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html>
>>>> >  
>>>> > <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html
>>>> >  
>>>> > <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170202/4ce5b42f/attachment.html>>>
>>>> >
>>>> > ------------------------------
>>>> >
>>>> > Subject: Digest Footer
>>>> >
>>>> > _______________________________________________
>>>> > CF-metadata mailing list
>>>> > [email protected] <mailto:[email protected]> 
>>>> > <mailto:[email protected] <mailto:[email protected]>>
>>>> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata 
>>>> > <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata> 
>>>> > <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata 
>>>> > <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>>
>>>> >
>>>> >
>>>> > ------------------------------
>>>> >
>>>> > End of CF-metadata Digest, Vol 166, Issue 3
>>>> > *******************************************
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Sincerely,
>>>> >
>>>> > Bob Simons
>>>> > IT Specialist
>>>> > Environmental Research Division
>>>> > NOAA Southwest Fisheries Science Center
>>>> > 99 Pacific St., Suite 255A      (New!)
>>>> > Monterey, CA 93940               (New!)
>>>> > Phone: (831)333-9878 <tel:%28831%29333-9878>            (New!)
>>>> > Fax:   (831)648-8440 <tel:%28831%29648-8440>
>>>> > Email: [email protected] <mailto:[email protected]> 
>>>> > <mailto:[email protected] <mailto:[email protected]>>
>>>> >
>>>> > The contents of this message are mine personally and
>>>> > do not necessarily reflect any position of the
>>>> > Government or the National Oceanic and Atmospheric Administration.
>>>> > <>< <>< <>< <>< <>< <>< <>< <>< <><
>>>> >
>>>> > _______________________________________________
>>>> > CF-metadata mailing list
>>>> > [email protected] <mailto:[email protected]>
>>>> > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata 
>>>> > <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>>>> 
>>>> -------------- next part --------------
>>>> An HTML attachment was scrubbed...
>>>> URL: 
>>>> <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170203/4ff55def/attachment.html
>>>>  
>>>> <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20170203/4ff55def/attachment.html>>
>>>> 
>>>> ------------------------------
>>>> 
>>>> Subject: Digest Footer
>>>> 
>>>> _______________________________________________
>>>> CF-metadata mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata 
>>>> <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>>>> 
>>>> 
>>>> ------------------------------
>>>> 
>>>> End of CF-metadata Digest, Vol 166, Issue 5
>>>> *******************************************
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Sincerely,
>>>> 
>>>> Bob Simons
>>>> IT Specialist
>>>> Environmental Research Division
>>>> NOAA Southwest Fisheries Science Center 
>>>> 99 Pacific St., Suite 255A      (New!)
>>>> Monterey, CA 93940               (New!) 
>>>> Phone: (831)333-9878 <tel:(831)%20333-9878>            (New!)
>>>> Fax:   (831)648-8440 <tel:(831)%20648-8440>
>>>> Email: [email protected] <mailto:[email protected]>
>>>> 
>>>> The contents of this message are mine personally and 
>>>> do not necessarily reflect any position of the 
>>>> Government or the National Oceanic and Atmospheric Administration.
>>>> <>< <>< <>< <>< <>< <>< <>< <>< <>< 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> CF-metadata mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata 
>>>> <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> Christopher Barker, Ph.D.
>>>> Oceanographer
>>>> 
>>>> Emergency Response Division
>>>> NOAA/NOS/OR&R            (206) 526-6959   voice
>>>> 7600 Sand Point Way NE   (206) 526-6329   fax
>>>> Seattle, WA  98115       (206) 526-6317   main reception
>>>> 
>>>> [email protected] 
>>>> <mailto:[email protected]>_______________________________________________
>>>> CF-metadata mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata 
>>>> <http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>
>>> 
>> 
>

_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Re: [CF-metadata] Extension of Discrete Sampling Geometries for Simple Features

Reply via email to