paleolimbot commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1752273308
##########
src/main/thrift/parquet.thrift:
##########
@@ -237,6 +237,98 @@ struct SizeStatistics {
3: optional list<i64> definition_level_histogram;
}
+/**
+ * Interpretation for edges of GEOMETRY logical type, i.e. whether the edge
+ * between points represent a straight cartesian line or the shortest line on
+ * the sphere. It applies to all non-point geometry objects.
+ */
+enum Edges {
+ PLANAR = 0;
+ SPHERICAL = 1;
+}
+
+/**
+ * A custom binary-encoded polygon or multi-polygon to represent a covering of
+ * geometries. For example, it may be a bounding box or an envelope of
geometries
+ * when a bounding box cannot be built (e.g. a geometry has spherical edges,
or if
+ * an edge of geographic coordinates crosses the antimeridian). In addition,
it can
+ * also be used to provide vendor-agnostic coverings like S2 or H3 grids.
+ */
+struct Covering {
+ /**
+ * A type of covering. Currently accepted values: "WKB".
+ */
+ 1: required string kind;
+ /**
+ * A payload specific to kind. Below are the supported values:
+ * - WKB: well-known binary of a POLYGON or MULTI-POLYGON that completely
+ * covers the contents. This will be interpreted according to the same CRS
+ * and edges defined by the logical type.
+ */
+ 2: required binary value;
+}
+
+/**
+ * Bounding box of geometries in the representation of min/max value pair of
+ * coordinates from each axis. Values of Z and M are omitted for 2D geometries.
+ * Filter pushdown on geometries are only safe for planar spatial predicate
+ * but it is recommended that the writer always generates bounding box
statistics,
+ * regardless of whether the geometries are planar or spherical.
+ */
+struct BoundingBox {
Review Comment:
> While PostGIS uses a distinct type for geography, this is not necessarily
a good idea that other formats should reproduce (I think it was due to
historical constraints)
I think consolidating GEOGRAPHY and GEOMETRY in the same type helps simplify
things and more succinctly describe what the difference is between the two.
Considering them as separate types is helpful in type systems where
parameterized types are difficult or impossible to express (e.g., Postgres).
> I'm not saying that the specification should support "min" > "max" now,
just suggesting to keep this door open.
I don't think this is a good idea...dataset authors should split anything
that crosses the antimeridian into two polygons to satisfy this constraint
(which would allow Parquet implementations to do filter pushdown without ever
having to inspect the `crs` key). The "door open" is the "covering", whose
payload is arbitrary key/value and could be used to include this type of
specification in the future if it can be demonstrated to be useful in this
context.
> I suggest to remove the sentence saying "it is recommended that the writer
always generates bounding box statistics, regardless of whether the geometries
are planar or spherical".
I agree with this...I think it's fine if they exist but saying that they
should or must generate them is confusing (since they shouldn't be used).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]