desruisseaux commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1731852567
##########
src/main/thrift/parquet.thrift:
##########
@@ -237,6 +237,98 @@ struct SizeStatistics {
3: optional list<i64> definition_level_histogram;
}
+/**
+ * Interpretation for edges of GEOMETRY logical type, i.e. whether the edge
+ * between points represent a straight cartesian line or the shortest line on
+ * the sphere. It applies to all non-point geometry objects.
+ */
+enum Edges {
+ PLANAR = 0;
+ SPHERICAL = 1;
+}
+
+/**
+ * A custom binary-encoded polygon or multi-polygon to represent a covering of
+ * geometries. For example, it may be a bounding box or an envelope of
geometries
+ * when a bounding box cannot be built (e.g. a geometry has spherical edges,
or if
+ * an edge of geographic coordinates crosses the antimeridian). In addition,
it can
+ * also be used to provide vendor-agnostic coverings like S2 or H3 grids.
+ */
+struct Covering {
+ /**
+ * A type of covering. Currently accepted values: "WKB".
+ */
+ 1: required string kind;
+ /**
+ * A payload specific to kind. Below are the supported values:
+ * - WKB: well-known binary of a POLYGON or MULTI-POLYGON that completely
+ * covers the contents. This will be interpreted according to the same CRS
+ * and edges defined by the logical type.
+ */
+ 2: required binary value;
+}
+
+/**
+ * Bounding box of geometries in the representation of min/max value pair of
+ * coordinates from each axis. Values of Z and M are omitted for 2D geometries.
+ * Filter pushdown on geometries are only safe for planar spatial predicate
+ * but it is recommended that the writer always generates bounding box
statistics,
+ * regardless of whether the geometries are planar or spherical.
Review Comment:
I cannot speak for the author's intend, but three difficulties that I see
with bounding boxes on a sphere are:
* Boxes crossing the anti-meridian (e.g., from 170° to −170° of longitude).
Pretty much everything (union, intersection, adding points) become more
complicated. There is no easy fix, switching to e.g. the 0…360° convention
works only in special cases.
* As the box top/bottom border become closer to the north/south pole, the
box width become larger even if the real world feature is not that large. If
the box includes the pole, it become 360° width even if the feature is very
small. It makes the box quite ineffective as the "smallest" enclosing bounding
box. In particular, it can ruin the performance that we would expect from
tiling as soon as a box's border become close enough to a pole.
* When testing whether a point is inside the box, we can get false positives
or false negatives near the top and bottom borders. It can happen if the box
describes the minimum and maximum coordinates values of all control points (or
nodes) of geometries, and if the line segments between those points are
interpreted as the shortest paths. Because geodesics appears as curves on a
(_latitude_, _longitude_) map, these curves may cross (exit then reenter) a
bounding box border.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]