paleolimbot commented on code in PR #494:
URL: https://github.com/apache/parquet-format/pull/494#discussion_r2057465094


##########
Geospatial.md:
##########
@@ -94,6 +94,36 @@ Bounding box is defined as the thrift struct below in the 
representation of
 min/max value pair of coordinates from each axis. Note that X and Y Values are
 always present. Z and M are omitted for 2D geospatial instances.
 
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of [invalid geospatial values](#invalid-geospatial-values).
+
+* X and Y: Skip any invalid X or Y value and continue processing the remaining 
X or Y 
+  values. Do not produce a bounding box if all X or all Y values are invalid.
+
+* Z: Skip any invalid Z value and continue processing the remaining Z values.
+  Omit Z from the bounding box if all Z values are invalid.
+
+* M: Skip any invalid M value and continue processing the remaining M values.
+  Omit M from the bounding box if all M values are invalid.
+
+Readers should follow the guidelines below when examining bounding boxes. If 
+a bounding box is [invalid](#invalid-geospatial-values), it is treated as a 
`no 
+bounding box` case.
+
+* No bounding box: No assumptions can be made about the presence or validity 
+  of coordinate values. Readers may need to load all individual coordinate 
+  values for validation.
+
+* A bounding box is present:
+    * X and Y: Both X and Y of the bounding box must be present. If either 
+      is missing, the bounding box is invalid.

Review Comment:
   ```suggestion
   * A bounding box is present:
       * Because xmin, ymin, xmax, and ymax are required fields, they are 
always present in the bounding box
   ```
   
   ...maybe?



##########
Geospatial.md:
##########
@@ -94,6 +94,36 @@ Bounding box is defined as the thrift struct below in the 
representation of
 min/max value pair of coordinates from each axis. Note that X and Y Values are
 always present. Z and M are omitted for 2D geospatial instances.
 
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of [invalid geospatial values](#invalid-geospatial-values).
+
+* X and Y: Skip any invalid X or Y value and continue processing the remaining 
X or Y 
+  values. Do not produce a bounding box if all X or all Y values are invalid.
+
+* Z: Skip any invalid Z value and continue processing the remaining Z values.
+  Omit Z from the bounding box if all Z values are invalid.
+
+* M: Skip any invalid M value and continue processing the remaining M values.
+  Omit M from the bounding box if all M values are invalid.
+
+Readers should follow the guidelines below when examining bounding boxes. If 
+a bounding box is [invalid](#invalid-geospatial-values), it is treated as a 
`no 

Review Comment:
   ```suggestion
   a bounding box is contains `NaN` values, it is treated as a `no 
   ```
   
   (this would be the only invalid case, correct?



##########
Geospatial.md:
##########
@@ -162,3 +192,18 @@ The axis order of the coordinates in WKB and bounding box 
stored in Parquet
 follows the de facto standard for axis order in WKB and is therefore always
 (x, y) where x is easting or longitude and y is northing or latitude. This
 ordering explicitly overrides the axis order as specified in the CRS.
+
+# Invalid geospatial values
+
+An invalid geospatial value refers to any of the following cases:
+
+* `null`: A null value in Parquet.
+* A non-`null` value that are encoded in a valid WKB or bounding box format 

Review Comment:
   I think generally it might be better to inline the relevant pieces above 
(I've made suggestions...it is mostly just `NaN`). This section could perhaps 
be used to make non-spatial implementors aware that EMPTY geometries exist and 
that NaN values in geometries are rare and usually result in errors when an 
operation is performed (or it could be omitted).



##########
Geospatial.md:
##########
@@ -94,6 +94,36 @@ Bounding box is defined as the thrift struct below in the 
representation of
 min/max value pair of coordinates from each axis. Note that X and Y Values are
 always present. Z and M are omitted for 2D geospatial instances.
 
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of [invalid geospatial values](#invalid-geospatial-values).
+
+* X and Y: Skip any invalid X or Y value and continue processing the remaining 
X or Y 
+  values. Do not produce a bounding box if all X or all Y values are invalid.

Review Comment:
   Can we just say `NaN` instead of `invalid`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to