wgtmac commented on code in PR #494:
URL: https://github.com/apache/parquet-format/pull/494#discussion_r2059908869
##########
Geospatial.md:
##########
@@ -94,6 +94,39 @@ Bounding box is defined as the thrift struct below in the
representation of
min/max value pair of coordinates from each axis. Note that X and Y Values are
always present. Z and M are omitted for 2D geospatial instances.
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of edge cases.
+
+* `null` instance: Skip it and continue processing the remaining
+ geospatial instances. Do not produce a bounding box if all instances are
null.
+* Non-`null` instance with [invalid geospatial
values](#invalid-geospatial-values):
+ * X and Y: Skip any invalid X or Y value and continue processing the
+ remaining X or Y values. Do not produce a bounding box if all X or all Y
+ values are invalid.
+
+ * Z: Skip any invalid Z value and continue processing the remaining Z values.
+ Omit Z from the bounding box if all Z values are invalid.
+
+ * M: Skip any invalid M value and continue processing the remaining M values.
+ Omit M from the bounding box if all M values are invalid.
+
+Readers should follow the guidelines below when examining bounding boxes.
+Parquet does not permit `null` or `NaN` values in bounding boxes, whether at
+the overall bounding box level or within individual coordinate fields.
+
+* No bounding box: No assumptions can be made about the presence or validity
+ of coordinate values. Readers may need to load all individual coordinate
+ values for validation.
+
+* A bounding box is present:
+ * X and Y: Both X and Y of the bounding box must be present.
Review Comment:
```suggestion
* X and Y: Both X and Y of the bounding box must be present. If any X or
Y
value is invalid, this bounding box is not reliable and cannot be used.
```
##########
Geospatial.md:
##########
@@ -162,3 +195,19 @@ The axis order of the coordinates in WKB and bounding box
stored in Parquet
follows the de facto standard for axis order in WKB and is therefore always
(x, y) where x is easting or longitude and y is northing or latitude. This
ordering explicitly overrides the axis order as specified in the CRS.
+
+# Invalid geospatial values
+
+An invalid geospatial value refers to the coordinate values of a non-`null`
+geospatial instance that are encoded in a valid WKB format, but are not
+considered valid values under this specification. While different WKB
+readers may interpret such values differently, the resulting output should
+be treated as invalid.
+
+* `NaN`: Not a Number. For example, `POINT EMPTY` in WKB is represented by a
+ `Point` with each ordinate value set to an IEEE-754 quiet NaN value.
+* `Empty geometries`: Geometries explicitly marked as empty in WKB using
Review Comment:
I suppose that `LINESTRING EMPTY` or `POLYGON EMPTY` are WKT? Do we have
canonical WKB values to demonstrate?
##########
Geospatial.md:
##########
@@ -94,6 +94,39 @@ Bounding box is defined as the thrift struct below in the
representation of
min/max value pair of coordinates from each axis. Note that X and Y Values are
always present. Z and M are omitted for 2D geospatial instances.
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of edge cases.
+
+* `null` instance: Skip it and continue processing the remaining
+ geospatial instances. Do not produce a bounding box if all instances are
null.
+* Non-`null` instance with [invalid geospatial
values](#invalid-geospatial-values):
+ * X and Y: Skip any invalid X or Y value and continue processing the
+ remaining X or Y values. Do not produce a bounding box if all X or all Y
+ values are invalid.
+
+ * Z: Skip any invalid Z value and continue processing the remaining Z values.
+ Omit Z from the bounding box if all Z values are invalid.
+
+ * M: Skip any invalid M value and continue processing the remaining M values.
+ Omit M from the bounding box if all M values are invalid.
+
+Readers should follow the guidelines below when examining bounding boxes.
+Parquet does not permit `null` or `NaN` values in bounding boxes, whether at
+the overall bounding box level or within individual coordinate fields.
+
+* No bounding box: No assumptions can be made about the presence or validity
+ of coordinate values. Readers may need to load all individual coordinate
+ values for validation.
+
+* A bounding box is present:
+ * X and Y: Both X and Y of the bounding box must be present.
+ * Z: If Z of the bounding box is missing, readers should not assume
Review Comment:
```suggestion
* Z: If Z of the bounding box is missing or contains any invalid value,
readers should not assume
```
##########
Geospatial.md:
##########
@@ -162,3 +195,19 @@ The axis order of the coordinates in WKB and bounding box
stored in Parquet
follows the de facto standard for axis order in WKB and is therefore always
(x, y) where x is easting or longitude and y is northing or latitude. This
ordering explicitly overrides the axis order as specified in the CRS.
+
+# Invalid geospatial values
+
+An invalid geospatial value refers to the coordinate values of a non-`null`
+geospatial instance that are encoded in a valid WKB format, but are not
+considered valid values under this specification. While different WKB
+readers may interpret such values differently, the resulting output should
+be treated as invalid.
+
+* `NaN`: Not a Number. For example, `POINT EMPTY` in WKB is represented by a
+ `Point` with each ordinate value set to an IEEE-754 quiet NaN value.
+* `Empty geometries`: Geometries explicitly marked as empty in WKB using
+ indicators such as `numPoints`, `numRings`, or `numGeometries`. Examples
+ include `LINESTRING EMPTY` or `POLYGON EMPTY`.
+* `Out-of-bounds coordinates`: Values that fall outside the valid range
Review Comment:
Do we need to provide all invalid examples so implementations do not miss
anything?
##########
Geospatial.md:
##########
@@ -162,3 +195,19 @@ The axis order of the coordinates in WKB and bounding box
stored in Parquet
follows the de facto standard for axis order in WKB and is therefore always
(x, y) where x is easting or longitude and y is northing or latitude. This
ordering explicitly overrides the axis order as specified in the CRS.
+
+# Invalid geospatial values
+
+An invalid geospatial value refers to the coordinate values of a non-`null`
+geospatial instance that are encoded in a valid WKB format, but are not
Review Comment:
As we have mentioned `a valid WKB format`, do we need to provide guidelines
for `invalid WKB format`?
##########
Geospatial.md:
##########
@@ -94,6 +94,39 @@ Bounding box is defined as the thrift struct below in the
representation of
min/max value pair of coordinates from each axis. Note that X and Y Values are
always present. Z and M are omitted for 2D geospatial instances.
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of edge cases.
+
+* `null` instance: Skip it and continue processing the remaining
+ geospatial instances. Do not produce a bounding box if all instances are
null.
+* Non-`null` instance with [invalid geospatial
values](#invalid-geospatial-values):
+ * X and Y: Skip any invalid X or Y value and continue processing the
+ remaining X or Y values. Do not produce a bounding box if all X or all Y
+ values are invalid.
+
+ * Z: Skip any invalid Z value and continue processing the remaining Z values.
+ Omit Z from the bounding box if all Z values are invalid.
+
+ * M: Skip any invalid M value and continue processing the remaining M values.
+ Omit M from the bounding box if all M values are invalid.
+
+Readers should follow the guidelines below when examining bounding boxes.
+Parquet does not permit `null` or `NaN` values in bounding boxes, whether at
+the overall bounding box level or within individual coordinate fields.
+
+* No bounding box: No assumptions can be made about the presence or validity
+ of coordinate values. Readers may need to load all individual coordinate
+ values for validation.
+
+* A bounding box is present:
+ * X and Y: Both X and Y of the bounding box must be present.
+ * Z: If Z of the bounding box is missing, readers should not assume
+ anything about the presence or validity of Z values and may need to
+ load individual coordinates for validation.
+ * M: If M of the bounding box is missing, readers should not assume
Review Comment:
```suggestion
* M: If M of the bounding box is missing or contains any invalid value,
readers should not assume
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]