A few comments, though you all seem to have this in hand :-)
I was asking whether this means that for each *collection* (of points, > lines or > polygons) there is a *single* timeseries. I don't get why this matters -- any number of time series could be associated with a single "entity" -- just like any number of timeseries can be associated with given coordinates in regular old CF. > For instance, in your example of a > single geometry composed of several polygons, there is a single number for > each > time. But that is not the case for weather stations; for each weather > station > there is a timeseries, and at each time there is a different number (value > of > temperature, precipitation or whatever) for each weather station. I think it may be helpful to borrow terminology (and the data model) from the GIS world here. IN this case, I am referencing the geoJSON spec, as I happen to be working with that at the moment, but the basic data model is pretty consistent. http://geojson.org/geojson-spec.html Note that they have "geometries" which can be things like points, polygons, polyllines. IIUC (and I'm no osgeo mavin) geometries represent a "single" entity. Then there are "Features": a Feature is essentially data associated with a particular geometry. But note: there are "Collections" -- both Geometry and Feature Collections -- that is what you use to "bundle" various data together. I think we may be well served by thinking in terms of mapping the GIS data model to CF/netcdf -- for instance it would be great to be able to write a netcdf<->geoJSON converter that was lossless, AND would be fairly "native" in both cases. You also > write, "The US National Weather Service’s National Water Model (NWM) ... > forecasts streamflow rates in about 2.7 million stream segments averaging > 2km." > The stream network is a MultiLineString geometry, but I don't think there > is > just one value of streamflow applying to the entire network at any given > time; > no -- of course not. So that network (if I understand the GIS data model) should be a Feature Collection, not all one Feature. So a whole collection of geometries as well. The "trick" with this data model is that it "de-vecoritizes" the data. Those of us used to working with netcdf, CF, gridded data, etc, tend to think that you'd want to have, for instance, a vector of geometries, and then various vectors of data associated with those geometries. whereas the GIS data model associated data with a given geometry, and then creates collections of those. This is kindof like the old C conundrum: Do a use a struct of arrays, or an array of structs? netcdf is very much about the struct of arrays approach. (though I'm still confused, maybe you can have an "array" of data associated with a GeometryCollection?) as for MultiLineString -- you could associate an array of data with the Multilinestring -- so one value per segment. But I think that violates the intent of the data model -- you should have a GeometryCollection of linestrings instead. and then each segment has its own geometry and you can associate an array of data with that. (or it should be a FeatureCollection? I'm getting confused now! I guess there is a different timeseries for each stream segment. But in my > example above, the Atlantic Ocean is a single polygon with a single > timeseries > for its average temperature, not a different timeseries for each node. right, so that Polygon would be a single Feature. > Thus I > am unclear about the dimensions of the data. In terms of your original > example, > does the data have dimensions (time,geometry, where geometry=1) or > (time,node)? > (time,geometry, where geometry=1) time,node would be for data associated with a FeatureCollection of Points (or a MultiPoint). Does anyone "get" the GIS data model. I'm quite confused as to when you would use: MultiPolygon vs GeometryCollection of Polygons vs FeatureCollection of Features with Polygon Geometries But I'm going t take a stab at it: MultiPolygon (and MultiLInestring, and MultiPoint) is used when you have more than one of a particular type of geometry that are logically one thing -- maybe an archipelago, for instance. A Polygon geometry can represent a simple polygon, or a polygon with holes in it -- but can not represent two separate polygons. So if you have multiple polygons that are geometrically distinct, but logically connected, you use a MultiPolygon. I'm on shakier ground about when you want to use a GeometryCollection vs a FeatureCollection, but I _think_ that the point of a geometrycollection is that you can group different types of geometry -- but still want them to be treated as a single entity. I've dealt with all this trying to jam data that fits well into netcdf into geoJSON, or GIS_oriented systems -- it's quite hard to be efficient about it :-) - i.e there is really no way to associate an array of data with an array of geometries -- it sure looks like you could do it with GeometryCollections, but the systems aren't expecting that. Of course, CF doesn't need to follow this data model, but it's a good idea to be informed by it. > Nonetheless in both cases the geometries have to be described. I think the > difference is how we attach this description to the data or coordinates, > rather > than how the description is constructed. > indeed. > You propose the index variable in order for the convention to be like > ugrid. > However this still seems to me to be an unnecessary complexity and use of > space > if you aren't going to have many shared nodes. In the GIS data model, nodes are not shared between geometries, and you are quite right that keeping nodes separate with geometries indexing nto it is an added complication and would not be space-efficient. However, there is another reason to do it -- it makes it definitive that two (or more) geometries share the exact same node, rather than them being distinct points that happened to be at the same location (Or worse, with FP error and all, two points that are very close)e This is actually a major limitation in the standard GIS model. > I think the case for having > another convention, distinct from ugrid, is stronger if it is *unlike* > ugrid > in this respect, and therefore simpler as well. > I still think that it should be separate from UGRID -- it really is a different use case, though they should still share whatever they can, and it could turn out that UGRID is a special case of geometries? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [email protected]
_______________________________________________ CF-metadata mailing list [email protected] http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
