paleolimbot commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1766070563
##########
src/main/thrift/parquet.thrift:
##########
@@ -373,6 +467,98 @@ struct JsonType {
struct BsonType {
}
+/**
+ * Physical type and encoding for the geometry type.
+ */
+enum GeometryEncoding {
+ /**
+ * Allowed for physical type: BYTE_ARRAY.
+ *
+ * Well-known binary (WKB) representations of geometries.
+ *
+ * To be clear, we follow the same rule of WKB and coordinate axis order from
+ * GeoParquet [1][2]. It is the ISO WKB supporting XY, XYZ, XYM, XYZM and the
+ * standard geometry types (Point, LineString, Polygon, MultiPoint,
+ * MultiLineString, MultiPolygon, and GeometryCollection).
+ *
+ * This is the preferred encoding for maximum portability. It also supports
+ * GeometryStatistics to be set in the column chunk and page index.
+ *
+ * [1]
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L92
+ * [2]
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L155
Review Comment:
```suggestion
* [1]
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L92
* [2]
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L155
* [3] https://portal.ogc.org/files/?artifact_id=18241
* [4] https://www.iso.org/standard/60343.html
* [5] https://www.geopackage.org/spec130/#gpb_spec
```
##########
src/main/thrift/parquet.thrift:
##########
@@ -373,6 +467,98 @@ struct JsonType {
struct BsonType {
}
+/**
+ * Physical type and encoding for the geometry type.
+ */
+enum GeometryEncoding {
+ /**
+ * Allowed for physical type: BYTE_ARRAY.
+ *
+ * Well-known binary (WKB) representations of geometries.
+ *
+ * To be clear, we follow the same rule of WKB and coordinate axis order from
+ * GeoParquet [1][2]. It is the ISO WKB supporting XY, XYZ, XYM, XYZM and the
+ * standard geometry types (Point, LineString, Polygon, MultiPoint,
+ * MultiLineString, MultiPolygon, and GeometryCollection).
+ *
+ * This is the preferred encoding for maximum portability. It also supports
+ * GeometryStatistics to be set in the column chunk and page index.
+ *
+ * [1]
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L92
+ * [2]
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L155
+ */
+ WKB = 0;
+}
+
+/**
+ * Geometry logical type annotation (added in 2.11.0)
+ */
+struct GeometryType {
+ /**
+ * Physical type and encoding for the geometry type.
+ * Please refer to the definition of GeometryEncoding for more detail.
+ */
+ 1: required GeometryEncoding encoding;
+ /**
+ * Edges of geometry type.
+ * Please refer to the definition of Edges for more detail.
+ */
+ 2: required Edges edges;
+ /**
+ * Coordinate Reference System, i.e. mapping of how coordinates refer to
+ * precise locations on earth. Writers are not required to set this field.
+ * Once crs is set, crs_encoding field below MUST be set together.
+ * For example, "OGC:CRS84" can be set in the form of PROJJSON as below:
+ * {
+ * "$schema": "https://proj.org/schemas/v0.5/projjson.schema.json",
+ * "type": "GeographicCRS",
+ * "name": "WGS 84 longitude-latitude",
+ * "datum": {
+ * "type": "GeodeticReferenceFrame",
+ * "name": "World Geodetic System 1984",
+ * "ellipsoid": {
+ * "name": "WGS 84",
+ * "semi_major_axis": 6378137,
+ * "inverse_flattening": 298.257223563
+ * }
+ * },
+ * "coordinate_system": {
+ * "subtype": "ellipsoidal",
+ * "axis": [
+ * {
+ * "name": "Geodetic longitude",
+ * "abbreviation": "Lon",
+ * "direction": "east",
+ * "unit": "degree"
+ * },
+ * {
+ * "name": "Geodetic latitude",
+ * "abbreviation": "Lat",
+ * "direction": "north",
+ * "unit": "degree"
+ * }
+ * ]
+ * },
+ * "id": {
+ * "authority": "OGC",
+ * "code": "CRS84"
+ * }
+ * }
+ */
+ 3: optional string crs;
+ /**
+ * Encoding used in the above crs field. It MUST be set if crs field is set.
+ * Currently the only allowed value is "PROJJSON".
+ */
+ 4: optional string crs_encoding;
+ /**
+ * Additional informative metadata.
+ * GeoParquet could offload its column metadata in a JSON-encoded UTF-8
string:
+ *
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L46
Review Comment:
```suggestion
* Additional informative metadata formatted as a JSON-encoded UTF-8
string provided
* for future informative metadata that may be useful to include but is
not strictly
* required by the low-level Parquet implementation to implement filter
pushdown.
```
...or some other clarification to specifically indicate what is or is not
allowed here at the time of this writing?
##########
src/main/thrift/parquet.thrift:
##########
@@ -237,6 +237,97 @@ struct SizeStatistics {
3: optional list<i64> definition_level_histogram;
}
+/**
+ * Interpretation for edges of GEOMETRY logical type, i.e. whether the edge
+ * between points represent a straight cartesian line or the shortest line on
+ * the sphere. It applies to all non-point geometry objects.
+ */
+enum Edges {
+ PLANAR = 0;
+ SPHERICAL = 1;
+}
+
+/**
+ * A custom binary-encoded polygon or multi-polygon to represent a covering of
+ * geometries. For example, it may be a bounding box or an envelope of
geometries
+ * when a bounding box cannot be built (e.g. a geometry has spherical edges,
or if
+ * an edge of geographic coordinates crosses the antimeridian). In addition,
it can
+ * also be used to provide vendor-agnostic coverings like S2 or H3 grids.
Review Comment:
```suggestion
* an edge of geographic coordinates crosses the antimeridian). It may be
* extended in future versions to provide vendor-agnostic coverings like
* vectors of cells on a discrete global grid (e.g., S2 or H3 cells).
```
##########
src/main/thrift/parquet.thrift:
##########
@@ -237,6 +237,97 @@ struct SizeStatistics {
3: optional list<i64> definition_level_histogram;
}
+/**
+ * Interpretation for edges of GEOMETRY logical type, i.e. whether the edge
+ * between points represent a straight cartesian line or the shortest line on
+ * the sphere. It applies to all non-point geometry objects.
Review Comment:
If a more precise wording is useful here, it might be:
```suggestion
* Interpretation for edges of elements of a GEOMETRY logical type. In other
* words, whether a point between two vertices should be interpolated in
* its XY dimensions as if it were a Cartesian line connecting the two
* vertices (planar) or the shortest spherical arc between the longitude
* and latitude represented by the two vertices (spherical). This value
* applies to all non-point geometry objects and is independent of the
* coordinate reference system.
*
* Because most systems currently assume planar edges and do not support
* spherical edges, planar should be used as the default value.
```
##########
src/main/thrift/parquet.thrift:
##########
@@ -373,6 +467,98 @@ struct JsonType {
struct BsonType {
}
+/**
+ * Physical type and encoding for the geometry type.
+ */
+enum GeometryEncoding {
+ /**
+ * Allowed for physical type: BYTE_ARRAY.
+ *
+ * Well-known binary (WKB) representations of geometries.
+ *
+ * To be clear, we follow the same rule of WKB and coordinate axis order from
+ * GeoParquet [1][2]. It is the ISO WKB supporting XY, XYZ, XYM, XYZM and the
+ * standard geometry types (Point, LineString, Polygon, MultiPoint,
+ * MultiLineString, MultiPolygon, and GeometryCollection).
Review Comment:
If it is helpful to make this more self-contained, the specifics are:
```suggestion
* To be clear, we follow the same rule of WKB and coordinate axis order
from
* GeoParquet [1][2]. Geometries SHOULD be encoded as ISO WKB [3][4]
* supporting XY, XYZ, XYM, XYZM and the standard geometry types
* Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon,
* and GeometryCollection). Coordinate order is always (x, y) where x is
* easting or longitude and y is northing or latitude. This ordering
explicitly
* overrides the axis order as specified in the CRS following the
GeoPackage
* specification [5].
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]