wgtmac commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1606208597


##########
src/main/thrift/parquet.thrift:
##########
@@ -982,6 +1054,23 @@ union ColumnOrder {
    *       `-0.0` should be written into the min statistics field.
    */
   1: TypeDefinedOrder TYPE_ORDER;
+
+  /**
+   * The order only applies to GEOMETRY logical type.
+   *
+   * Please note that geometry objects cannot be compared directly. This order 
aims to
+   * provide an approach to build a bounding box for geometry objects in the 
same page
+   * or column chunk.
+   *
+   * In this order, all 2D geometries are regarded as a collection of 
coordinate (x, y).
+   * For example, POINT has one coordinate, LINESTRING has two coordinates, 
and POLYGON
+   * might have three or more coordinates. A bounding box is the combination 
of x_min,
+   * x_max, y_min, and y_max of all coordinates from all geometry values. For 
simplexty,
+   * min_value field in the Statistics/ColumnIndex is encoded as the 
concatenation of
+   * PLAIN-encoded DOUBLE-typed x_min and y_min values. Similarly, max_value 
field is
+   * encoded as the concatenation of PLAIN-encoded DOUBLE-typed x_max and 
y_max values.

Review Comment:
   Here the column statistics are the aggregate index for a group of geometry 
values. For example, usually the column statistics are collected from a data 
page of 10,000 values where bbox is more suitable. CMIW, S2/H3 covering might 
be more efficient in spatial join or record-level filtering.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to