szehon-ho commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1600952671


##########
src/main/thrift/parquet.thrift:
##########
@@ -373,6 +376,69 @@ struct JsonType {
 struct BsonType {
 }
 
+/**
+ * A geometry can be any of the following subtypes.
+ * The list of geospatial subtypes is taken from the OGC (Open Geospatial 
Consortium)
+ * SFA (Simple Feature Access) Part 1- Common Architecture.
+ */
+enum GeometrySubType {

Review Comment:
   Hm i dont see how this is that useful if we define Geometry to be sort of 
variant type, so vote to remove it personally.



##########
src/main/thrift/parquet.thrift:
##########
@@ -270,8 +270,11 @@ struct Statistics {
     * may set min_value="B", max_value="C". Such more compact values must 
still be
     * valid values within the column's logical type.
     *
-    * Values are encoded using PLAIN encoding, except that variable-length byte
-    * arrays do not include a length prefix.
+    * Values are encoded using PLAIN encoding, except that:
+    * 1) variable-length byte arrays do not include a length prefix.
+    * 2) geometry logical type with BoundingBoxOrder uses max_value/min_value 
pair

Review Comment:
   Yea I guess option 1 makes sense here.  In this use case, I need to think it 
through, but we may not even need advanced 'native' encoding options as long as 
we can save the bounding box in Parquet stats?  cc @jiayuasu 
   
   I guess if Parquet Java writer/reader may need to depend on JTS Geometry 
though, which is the normal in memory representation, unless we come up with 
something else here.



##########
src/main/thrift/parquet.thrift:
##########
@@ -373,6 +376,69 @@ struct JsonType {
 struct BsonType {
 }
 
+/**
+ * A geometry can be any of the following subtypes.
+ * The list of geospatial subtypes is taken from the OGC (Open Geospatial 
Consortium)
+ * SFA (Simple Feature Access) Part 1- Common Architecture.
+ */
+enum GeometrySubType {
+  POINT = 0;
+  LINESTRING = 1;
+  POLYGON = 2;
+  MULTIPOINT = 3;
+  MULTILINESTRING = 4;
+  MULTIPOLYGON = 5;
+  GEOMETRY_COLLECTION = 6;
+}
+
+/**
+ * Interpretation for edges, i.e. whether the edge between points
+ * represent a straight cartesian line or the shortest line on the sphere
+ */
+enum Edges {
+  PLANAR = 0;
+  // SPHERICAL = 1; // not supported yet
+}
+
+/**
+ * Well-Known Binary. This is a well-known and popular binary representation 
regulated
+ * by the Open Geospatial Consortium (OGC). 
+ */
+struct WKB {}
+/**
+ * Encoding for geospatial data.
+ */
+union GeospatialEncoding {
+  1: WKB WKB
+}
+
+/**
+ * Geometry logical type annotation
+ *
+ * Allowed for physical types: BINARY (added in 2.11.0)
+ */
+struct GeometryType {
+  /**
+   * The subtype of the geometry.
+   * If set, all values in the column must be of the same subtype.
+   * If not set, the column may contain values of any subtype.
+   */
+  1: optional GeometrySubType subtype;
+  /**
+   * The dimension of the geometry.
+   * For now only 2D geometry is supported and the value must be 2 if set.
+   */
+  2: optional byte dimension;

Review Comment:
   Yea maybe it is possible here, just min (x,y,z) and max (x,y,z) coordinates. 
 cc @jiayuasu to check my understanding



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to