Dear Jonathan, I can’t speak to the technical details, but can mention some motivation for simple geometries. Among other applications, NetCDF-CF is now being used as an intermediate & output data format in the US National Weather Service’s National Water Model (NWM). This forecasts streamflow rates in about 2.7 million stream segments averaging 2km, throughout the continental US, at multiple time horizons (3 hr, 18 hr, 10 days) every hour, and an ensemble for 30-day forecast less frequently. There are many applications which can benefit from detailed polyline and polygon geometries. While ugrid could also be used, the simple geometries approach presented is simpler to implement.
Regards, David Arctur On Sep 22, 2016, at 5:40 AM, Jonathan Gregory <j.m.greg...@reading.ac.uk> wrote: Dear Ben Thank you for your thoughtful and interesting proposal. I have quite a lot of questions and comments about it. * You explain that the need is to specify spatial coordinates with a simple geometry for a timeSeries variable. For example, this could be for the discharge as a function of time across some line in a river (your example), or I suppose it could be an average temperature as a function of time for the Atlantic Ocean, where you wanted to supply the polygon which drew the outline of the basin. Have I got the idea? Timeseries like this can be stored in CF, but their geographical extent is usually described only in words e.g. a region name of atlantic_ocean, and this is fine for applications like CMIP where you want to compare data from different data sources in which the Atlantic Ocean may have different exact shapes (different AOGCMs, in particular). An array of region names is also possible, so I don't think we need a new convention to contain your dwarf planet example. * Sect 9.1 on discrete sampling geometries says it cannot yet be used for cases "where geo-positioning cannot be described as a discrete point location. Problematic examples include time series that refer to a geographical region (e.g. the northern hemisphere) ...". Actually I think that's not quite right. The existing convention *can* describe regions which are contiguous, and rectangular or polygonal, using its usual bounds convention (Sect 7.1). I think we should consider changing this text, because it seems unnecessarily restrictive. For example, a timeSeries for the average temperature in the Northern Hemisphere can be stored like this: dimensions: region=1; nv=2; time=UNLIMITED; variables: float temperature(region,time); temperature:standard_name="surface_temperature"; temperature:units="K"; temperature:coordinates="lat lon"; temperature:cell_methods="time: mean area: mean"; float lat(region); lat:standard_name="latitude"; lat:units="degrees_north"; lat:bounds="lat_bounds"; float lat_bounds(region,nv); float lon(region); lon:standard_name="longitude"; lon:units="degrees_east"; lon:bounds="lon_bounds"; float lon_bounds(region,nv); data: lat_bounds=0,90; lon_bounds=0,360; which means the region is 0-90N and 0-360E. If the regions were irregular polygons in latitude and longitude, nv would be the number of vertices and the lat and lon bounds would trace the outline of the polygon e.g. nv=3, lat=0,90,0 and lon=0,0,90 describes the eighth of the sphere which is bounded by the meridians at 0E and 90E and the Equator. I think, therefore, we do not need an additional convention for points or polygonal regions. However, we would need new conventions for a timeseries where each value applies to a set of discontiguous regions or regions with holes in them, a set of points, a line or a set of lines. I guess that these are included in the geometry types you list (LineString, Multipoint, MultiLineString, and MultiPolygon). Do you have definite use-cases for all of these? (I ask this because we don't add new functionality to CF until there is a definite and common need for it in practice.) * I suspect that geometries of this kind can be described by the ugrid convention http://ugrid-conventions.github.io/ugrid-conventions, which is compliant with CF. Their purpose is to describe a set of connected points, edges or faces at which values are given, whereas in your case you'd give a single value for the whole set, but the description of the geometry itself might be similar. Have you had a look at whether ugrid could meet your needs? If it almost does so, perhaps a better thing to do would be to propose additions to ugrid. We would like to avoid having more than one way to describe such geometries. If you decide to make use of ugrid instead, the rest of my comments may not be relevant! * So far CF does not say anything about the use of netCDF-4 features (i.e. not the classic model). We have often discussed allowing them but the general argument is also made that there has to be a compelling case for providing a new way to do something which can already be done. (Steve Hankin often made this argument, but since he's mostly retired I'll make it now in his name :-) If there are two ways to do something, software has to support both of them. We already have ways to encode ragged arrays, so is there a compelling case for needing the netCDF-4 vlen array as well? We already have a way to encode strings too, as character arrays. I think this is probably a discussion we should have again in a different thread, so I'll just talk about your classic encoding. The same points apply to both encodings. * Your approach uses a coordinate_index variable to identify indices of geometry coordinates e.g. dimensions: indices = 30; node = 25 ; geom = 1 ; variables: int coordinate_index(indices) ; coordinate_index:coordinates = "x y" ; double x(node) ; double y(node) ; data: coordinate_index = 0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12, -2, 13, 14, 15, 16, -1, 17, 18, 19, 20, -1, 21, 22, 23, 24 ; x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7, 5, 11, 15, 13, 11 ; y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29, 25, 25, 25, 29, 25 ; where the -1 and -2 indices indicates where exterior and interior polygons begin, and the first polygon has an implied -1 at the start. Is that right? Given this example, I wonder why you need the index array, because none of the coordinates indices (values >=0) is repeated, so no space is saved in the x and y arrays. I guess this would be the usual case. If polygons did touch or lines crossed, a few points would be in common, but not so many that seems to need the complication of the index array. A simpler way to do it would be int outside_inside(node); // -1 for exterior, -2 for interior double x(node) ; double y(node) ; outside_inside=-1,-1,-1,-1,-1, -2,-2,-2,-2, -2,-2,-2,-2,-2, -1,-1,-1,-1,-1, -1,-1,-1,-1,-1; x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7, 5, 11, 15, 13, 11 ; y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29, 25, 25, 25, 29, 25 ; which needs only one dimension, or you could use the CF ragged array convention (Sect 9.3.3): segment=5; node=25; int count(segment); count:sample_dimension="node"; int outside_inside(segment); // -1 for exterior, -2 for interior double x(node) ; double y(node) ; outside_inside=-1,-2,-2,-1,-1; count=5,5,5,5,5; x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7, 5, 11, 15, 13, 11 ; y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29, 25, 25, 25, 29, 25 ; * You provide the attributes multipart_break_value and hole_break_value to specify the values (-1 and -2 above) for the outside vs inside distinction. Do you need the generality of being able to choose these values? It would seem simpler to use a character array and specify in the convention which letters should be used e.g. char outside_inside(segment) outside_inside="OOIIO"; That makes it more readable, perhaps. * Similarly, you propose attributes for clockwise/anticlockwise node order and for the polygon closure convention. Do these need to be freely choosable? You could specify clockwise, like the existing CF bounds convention, and that the polygons are closed. In the latter case, you could omit the last vertex of each polygon since it must be the same as the first, and that would save a bit of space. If you specify these choices, the attributes aren't needed. * If this convention is going to be used for discrete sampling geometries, an additional dimension is needed, because in a single data variable you might have data for several of these geometries. That is, you need an array of ragged arrays. Again, I wonder whether this suggests trying to use ugrid. It might be you could name each one as a mesh, and specify the geometry of for the set of timeSeries as an array of mesh names. That would be a very easy change to the existing Sect 9. Best wishes Jonathan _______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata _______________________________________________ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata