paleolimbot commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1700518484


##########
src/main/thrift/parquet.thrift:
##########
@@ -373,6 +453,51 @@ struct JsonType {
 struct BsonType {
 }
 
+/**
+ * Physical type and encoding for the geometry type.
+ */
+enum GeometryEncoding {
+  /**
+   * Allowed for physical type: BYTE_ARRAY.
+   *
+   * Well-known binary (WKB) representations of geometries. It supports 2D or
+   * 3D geometries of the standard geometry types (Point, LineString, Polygon,
+   * MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection). This
+   * is the preferred option for maximum portability.
+   *
+   * This encoding enables GeometryStatistics to be set in the column chunk
+   * and page index.
+   */
+  WKB = 0;
+
+  // TODO: add native encoding from GeoParquet/GeoArrow
+}
+
+/**
+ * Geometry logical type annotation (added in 2.11.0)
+ */
+struct GeometryType {
+  /**
+   * Physical type and encoding for the geometry type. Please refer to the
+   * definition of GeometryEncoding for more detail.
+   */
+  1: required GeometryEncoding encoding;
+  /**
+   * Edges of polygon.
+   */
+  2: required Edges edges;
+  /**
+   * Coordinate Reference System, i.e. mapping of how coordinates refer to
+   * precise locations on earth, e.g. OGC:CRS84
+   */
+  3: optional string crs;

Review Comment:
   > Looks like strong opinions on all fronts.
   
   Definitely 🙂 
   
   > I suggest we also add a string field namely `crs_kind` in addition to the 
crs field.
   
   I think this is a great idea. In this PR I think the scope should be how to 
add the Geometry type in a way that allows the geospatial community to have 
these discussions in a way that does not force Thrift changes and/or changes in 
Parquet implementations themselves. We can continue to debate the allowed 
values of `crs_kind` (and I imagine we will for some time as we accumulate use 
cases).
   
   > Isn't a bit weird that the only allowed value is the non-standard encoding
   
   There are a few threads in the GeoParquet repo where it was discussed...I 
think the idea was that it is structurally identical to WKT2 2019 but can be 
inspected with access to a JSON parser (which exists almost everywhere). This 
would allow (for example) a library implementing a computation to error for a 
Geographic CRS if it didn't apply, or to extract the authority and code without 
a WKT parser (WKT does not exist outside the CRS world as far as I know). This 
is a very good fit for something like (Geo)Parquet, where we are trying to 
ensure that those who care can express complex geospatial concepts without 
forcing Parquet implementations or related code to link to geo-specific 
libraries.
   
   > This is not needed as latest WKT 2 is backward compatible with the 
previous version
   
   In the case that it is allowed, it is probably a good idea to communicate 
that with a reference both standards 🙂 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to