desruisseaux commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1755860787


##########
src/main/thrift/parquet.thrift:
##########
@@ -237,6 +237,98 @@ struct SizeStatistics {
    3: optional list<i64> definition_level_histogram;
 }
 
+/**
+ * Interpretation for edges of GEOMETRY logical type, i.e. whether the edge
+ * between points represent a straight cartesian line or the shortest line on
+ * the sphere. It applies to all non-point geometry objects.
+ */
+enum Edges {
+  PLANAR = 0;
+  SPHERICAL = 1;
+}
+
+/**
+ * A custom binary-encoded polygon or multi-polygon to represent a covering of
+ * geometries. For example, it may be a bounding box or an envelope of 
geometries
+ * when a bounding box cannot be built (e.g. a geometry has spherical edges, 
or if
+ * an edge of geographic coordinates crosses the antimeridian). In addition, 
it can
+ * also be used to provide vendor-agnostic coverings like S2 or H3 grids.
+ */
+struct Covering {
+  /**
+   * A type of covering. Currently accepted values: "WKB".
+   */
+  1: required string kind;
+  /**
+   * A payload specific to kind. Below are the supported values:
+   * - WKB: well-known binary of a POLYGON or MULTI-POLYGON that completely
+   *   covers the contents. This will be interpreted according to the same CRS
+   *   and edges defined by the logical type.
+   */
+  2: required binary value;
+}
+
+/**
+ * Bounding box of geometries in the representation of min/max value pair of
+ * coordinates from each axis. Values of Z and M are omitted for 2D geometries.
+ * Filter pushdown on geometries are only safe for planar spatial predicate
+ * but it is recommended that the writer always generates bounding box 
statistics,
+ * regardless of whether the geometries are planar or spherical.
+ */
+struct BoundingBox {

Review Comment:
   This is not a geometry-specific concept. The idea is to reinterpret the 
"min" and "max" values so that they are no longer min/max, but "start value" 
and "end value". Then add the following rules:
   
   * The interior is everything from the start value to the end value, _going 
in the direction of increasing values_.
   * If _start_ > _end_, then we go from _start_ value to positive infinity, 
wrap around to negative infinity and continue until the _end_ value.
   
   That's all. No geometric concept, no need to know that the wraparound 
happens at ±180°. It works if all coordinates are inside a consistent range of 
validity. It may be [−180 … +180]° of longitude, or [0 … 360]° or anything else 
at user's choice (actually defined by the CRS). As long as all coordinates are 
guaranteed inside the same range, above algorithm does not need to know that 
range.
   
   This proposal works for all kinds of wraparound: not only longitudes, but 
also climatological data (e.g. average temperatures of January, February … 
December, then back to January with no particular year associated to those 
months), radar, _etc._ By contrast, if we choose to add a `Geography` type for 
handling longitude wraparound, then we would also need a `Radar` type for data 
on a polar coordinate system, a `ClimateCalendar` type for temporal coordinate 
system, _etc._ This is not scalable.
   
   Above needs are not hypothetical. By coincidence, I'm facing this week the 
problem of using climatological data in GIS system that do not understand 
temporal wraparound. Those data are common at WMO, NATO and other 
organizations. The "min" > "max" proposal addresses the general case. The 
`Geography` type does not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to