desruisseaux commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1752523823


##########
src/main/thrift/parquet.thrift:
##########
@@ -237,6 +237,98 @@ struct SizeStatistics {
    3: optional list<i64> definition_level_histogram;
 }
 
+/**
+ * Interpretation for edges of GEOMETRY logical type, i.e. whether the edge
+ * between points represent a straight cartesian line or the shortest line on
+ * the sphere. It applies to all non-point geometry objects.
+ */
+enum Edges {
+  PLANAR = 0;
+  SPHERICAL = 1;
+}
+
+/**
+ * A custom binary-encoded polygon or multi-polygon to represent a covering of
+ * geometries. For example, it may be a bounding box or an envelope of 
geometries
+ * when a bounding box cannot be built (e.g. a geometry has spherical edges, 
or if
+ * an edge of geographic coordinates crosses the antimeridian). In addition, 
it can
+ * also be used to provide vendor-agnostic coverings like S2 or H3 grids.
+ */
+struct Covering {
+  /**
+   * A type of covering. Currently accepted values: "WKB".
+   */
+  1: required string kind;
+  /**
+   * A payload specific to kind. Below are the supported values:
+   * - WKB: well-known binary of a POLYGON or MULTI-POLYGON that completely
+   *   covers the contents. This will be interpreted according to the same CRS
+   *   and edges defined by the logical type.
+   */
+  2: required binary value;
+}
+
+/**
+ * Bounding box of geometries in the representation of min/max value pair of
+ * coordinates from each axis. Values of Z and M are omitted for 2D geometries.
+ * Filter pushdown on geometries are only safe for planar spatial predicate
+ * but it is recommended that the writer always generates bounding box 
statistics,
+ * regardless of whether the geometries are planar or spherical.
+ */
+struct BoundingBox {

Review Comment:
   > dataset authors should split anything that crosses the antimeridian into 
two polygons
   
   Not necessarily. This approach introduces other problems (e.g. creates an 
artificial edge), and "min" > "max" is a good alternative which avoids this 
problem. Some dataset authors already use the latter (e.g. EPSG) and may not 
agree that they should change. This approach is also already used by some 
OGC/ISO standards.
   
   > which would allow Parquet implementations to do filter pushdown without 
ever having to inspect the crs key
   
   It is not necessary to inspect the CRS key if the specification said that 
every "min" > "max" cases are wraparound. It is not even necessary to know that 
the wraparound is 360° or any other value. It works in a truly generic way. The 
only requirement is to have more advanced implementations of the `contains` and 
`intersects` methods than the naive ones. This is only mathematics, could be 
documented in the specification, and would be truly generic (not specific to 
geographic data).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to