wgtmac commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1606208597
##########
src/main/thrift/parquet.thrift:
##########
@@ -982,6 +1054,23 @@ union ColumnOrder {
* `-0.0` should be written into the min statistics field.
*/
1: TypeDefinedOrder TYPE_ORDER;
+
+ /**
+ * The order only applies to GEOMETRY logical type.
+ *
+ * Please note that geometry objects cannot be compared directly. This order
aims to
+ * provide an approach to build a bounding box for geometry objects in the
same page
+ * or column chunk.
+ *
+ * In this order, all 2D geometries are regarded as a collection of
coordinate (x, y).
+ * For example, POINT has one coordinate, LINESTRING has two coordinates,
and POLYGON
+ * might have three or more coordinates. A bounding box is the combination
of x_min,
+ * x_max, y_min, and y_max of all coordinates from all geometry values. For
simplexty,
+ * min_value field in the Statistics/ColumnIndex is encoded as the
concatenation of
+ * PLAIN-encoded DOUBLE-typed x_min and y_min values. Similarly, max_value
field is
+ * encoded as the concatenation of PLAIN-encoded DOUBLE-typed x_max and
y_max values.
Review Comment:
Here the column statistics are the aggregate index for a group of geometry
values. For example, usually the column statistics are collected from a data
page of 10,000 values where bbox is more suitable. CMIW, S2/H3 covering might
be more efficient in spatial join or record-level filtering.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]