wgtmac commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1616485093
##########
src/main/thrift/parquet.thrift:
##########
@@ -373,6 +408,74 @@ struct JsonType {
struct BsonType {
}
+/**
+ * Phyiscal type and encoding for the geometry type.
+ */
+enum GeometryEncoding {
+ /**
+ * Allowed for phyiscal type: BYTE_ARRAY.
+ *
+ * Well-known binary (WKB) representations of geometries. It supports 2D or
+ * 3D geometries of the standard geometry types (Point, LineString, Polygon,
+ * MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection). This
+ * is the preferred option for maximum portability.
+ *
+ * This encoding enables GeometryStatistics to be set in the column chunk
+ * and page index.
+ */
+ WKB = 0;
+
+ /**
+ * Encodings from POINT to MULTIPOLYGON below are specialized for single
+ * geometry type and inspired by GeoArrow (https://geoarrow.org/format.html)
+ * native encodings. It uses the separated (struct) representation of
+ * coordinates for single-geometry type encodings because this encoding
+ * results in useful column statistics when row groups and/or files contain
+ * related features.
+ *
+ * WARNING: GeometryStatistics cannot be enabled for these encodings because
+ * only leaf columns can have column statistics and page index.
+ *
+ * The actual coordinates of the geometries MUST be stored as native numbers,
+ * i.e. using the DOUBLE type in a (repeated) group of fields (exact
+ * repetition depending on the geometry type).
+ *
+ * For the POINT encoding, this results in a struct of two fields for x and y
+ * coordinates (in case of 2D geometries):
+ * optional group geometry {
+ * required double x;
+ * required double y;
+ * }
+ *
+ * For more detail, please refer to link below:
+ *
https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#encoding
+ */
+ POINT = 1;
+ LINESTRING = 2;
+ POLYGON = 3;
+ MULTIPOINT = 4;
+ MULTILINESTRING = 5;
+ MULTIPOLYGON = 6;
+}
+
+/**
+ * Geometry logical type annotation (added in 2.11.0)
+ */
+struct GeometryType {
+ /**
+ * Phyiscal type and encoding for the geometry type. Please refer to the
+ * definition of GeometryEncoding for more detail.
+ */
+ 1: required GeometryEncoding encoding;
+ /**
+ * Additional informative metadata.
+ * It can be used by GeoParquet to offload some of the column metadata.
+ */
+ 2: optional string metadata;
+ /** File-level statistics for geometries */
+ 3: optional GeometryStatistics statistics;
Review Comment:
Introducing file-level statistics for all types is a separate topic (perhaps
together with the current v3 discussion). If missing file-level statistics of
geometry type is not a big issue, I prefer to remove this field from the
logical type. In most cases, parquet file unlikely will contain too many row
groups and row-group-level statistics are good enough to use.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]