jiayuasu commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1618187171
##########
src/main/thrift/parquet.thrift:
##########
@@ -373,6 +408,74 @@ struct JsonType {
struct BsonType {
}
+/**
+ * Phyiscal type and encoding for the geometry type.
+ */
+enum GeometryEncoding {
+ /**
+ * Allowed for phyiscal type: BYTE_ARRAY.
+ *
+ * Well-known binary (WKB) representations of geometries. It supports 2D or
+ * 3D geometries of the standard geometry types (Point, LineString, Polygon,
+ * MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection). This
+ * is the preferred option for maximum portability.
+ *
+ * This encoding enables GeometryStatistics to be set in the column chunk
+ * and page index.
+ */
+ WKB = 0;
+
+ /**
+ * Encodings from POINT to MULTIPOLYGON below are specialized for single
+ * geometry type and inspired by GeoArrow (https://geoarrow.org/format.html)
+ * native encodings. It uses the separated (struct) representation of
+ * coordinates for single-geometry type encodings because this encoding
+ * results in useful column statistics when row groups and/or files contain
+ * related features.
+ *
+ * WARNING: GeometryStatistics cannot be enabled for these encodings because
+ * only leaf columns can have column statistics and page index.
Review Comment:
Yes, as we discussed before, the current native encoding in GeoParquet
(borrowed from GeoArrow) does not allow mixed types of geometries in the same
column. In addition, its current design makes it hard for Iceberg to adopt. But
I chatted with @paleolimbot last week, and GeoArrow community is willing to
make necessary changes to make it work.
However, the main reason of having the native encoding in GeoParquet is to
compute min/max statistics. If Parquet allows bbox / s2 / h3 as the native
statistics, the native encoding does not seem to be necessary. This will
greatly increase the adoption of Parquet / GeoParquet.
So, given the uncertainty of this topic, maybe we can remove this in the
final PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]