wgtmac commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1722617837


##########
src/main/thrift/parquet.thrift:
##########
@@ -373,6 +462,56 @@ struct JsonType {
 struct BsonType {
 }
 
+/**
+ * Physical type and encoding for the geometry type.
+ */
+enum GeometryEncoding {
+  /**
+   * Allowed for physical type: BYTE_ARRAY.
+   *
+   * Well-known binary (WKB) representations of geometries. It supports 2D or
+   * 3D geometries of the standard geometry types (Point, LineString, Polygon,
+   * MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection). This
+   * is the preferred option for maximum portability.
+   *
+   * This encoding enables GeometryStatistics to be set in the column chunk
+   * and page index.
+   */
+  WKB = 0;
+
+  // TODO: add native encoding from GeoParquet/GeoArrow
+}
+
+/**
+ * Geometry logical type annotation (added in 2.11.0)
+ */
+struct GeometryType {
+  /**
+   * Physical type and encoding for the geometry type. Please refer to the
+   * definition of GeometryEncoding for more detail.
+   */
+  1: required GeometryEncoding encoding;
+  /**
+   * Edges of polygon.
+   */
+  2: required Edges edges;
+  /**
+   * Coordinate Reference System, i.e. mapping of how coordinates refer to
+   * precise locations on earth.
+   */
+  3: optional string crs;
+  /**
+   * Encoding used in the above crs field.
+   * Currently the only allowed value is "PROJJSON".
+   */
+  4: optional string crs_encoding;
+  /**
+   * Additional informative metadata.
+   * It can be used by GeoParquet to offload some of the column metadata.
+   */
+  5: optional binary metadata;

Review Comment:
   I found an issue with the type of this `metadata` field during PoC: 
https://github.com/apache/arrow/pull/43196#discussion_r1720791233
   
   If we make it a `binary` type, the writer impl is free to write whatever it 
likes, be it a JSON string (like what GeoParquet does) or binary-serialized 
metadata (e.g. in ProtoBuf or Base64). However, the reader impl is painful 
because it cannot assume whether it is a JSON string or something else.
   
   Should we change it to `string` type and observe the practice of GeoParquet 
spec, at least saying it is a valid JSON string?
   
   @paleolimbot @jiayuasu @zhangfengcdt @jorisvandenbossche 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to