wgtmac commented on code in PR #494:
URL: https://github.com/apache/parquet-format/pull/494#discussion_r2055264910


##########
Geospatial.md:
##########
@@ -94,6 +94,36 @@ Bounding box is defined as the thrift struct below in the 
representation of
 min/max value pair of coordinates from each axis. Note that X and Y Values are
 always present. Z and M are omitted for 2D geospatial instances.
 
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of invalid values. An invalid geospatial value refers to any of 
+the following: `NaN`, `null`, `does not exist` (e.g., LINESTRING EMPTY), or 
+`out of bounds` (e.g., `x < -180` or `x > 180` for `GEOGRAPHY` types):
+
+* X and Y: Skip any invalid X or Y value and processing the remaining X or Y 
+  values. Do not produce a bounding box if all X or all Y values are invalid.
+
+* Z: Skip any invalid Z value and continue processing the remaining Z values.
+  Omit Z from the bounding box if all Z values are invalid.
+
+* M: Skip any invalid M value and continue processing the remaining M values.
+  Omit M from the bounding box if all M values are invalid.
+
+Readers should follow the guidelines below when examining bounding boxes:
+
+* No bounding box: No assumptions can be made about the presence or absence 
+  of invalid values. Readers may need to load all individual coordinate 
+  values for validation.
+
+* A bounding box is present:
+    * X and Y: X and Y of the bounding box must be present. Readers should 

Review Comment:
   If any X or Y value in the bbox is invalid, the bbox is malformed and cannot 
be used.



##########
Geospatial.md:
##########
@@ -94,6 +94,36 @@ Bounding box is defined as the thrift struct below in the 
representation of
 min/max value pair of coordinates from each axis. Note that X and Y Values are
 always present. Z and M are omitted for 2D geospatial instances.
 
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of invalid values. An invalid geospatial value refers to any of 
+the following: `NaN`, `null`, `does not exist` (e.g., LINESTRING EMPTY), or 
+`out of bounds` (e.g., `x < -180` or `x > 180` for `GEOGRAPHY` types):
+
+* X and Y: Skip any invalid X or Y value and processing the remaining X or Y 

Review Comment:
   ```suggestion
   * X and Y: Skip any invalid X or Y value and continue processing the 
remaining X or Y 
   ```



##########
Geospatial.md:
##########
@@ -94,6 +94,36 @@ Bounding box is defined as the thrift struct below in the 
representation of
 min/max value pair of coordinates from each axis. Note that X and Y Values are
 always present. Z and M are omitted for 2D geospatial instances.
 
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of invalid values. An invalid geospatial value refers to any of 
+the following: `NaN`, `null`, `does not exist` (e.g., LINESTRING EMPTY), or 
+`out of bounds` (e.g., `x < -180` or `x > 180` for `GEOGRAPHY` types):
+
+* X and Y: Skip any invalid X or Y value and processing the remaining X or Y 
+  values. Do not produce a bounding box if all X or all Y values are invalid.
+
+* Z: Skip any invalid Z value and continue processing the remaining Z values.
+  Omit Z from the bounding box if all Z values are invalid.
+
+* M: Skip any invalid M value and continue processing the remaining M values.
+  Omit M from the bounding box if all M values are invalid.
+
+Readers should follow the guidelines below when examining bounding boxes:
+
+* No bounding box: No assumptions can be made about the presence or absence 

Review Comment:
   We may not make assumption of valid values as well. For example, we cannot 
think this is an empty bbox.



##########
Geospatial.md:
##########
@@ -94,6 +94,36 @@ Bounding box is defined as the thrift struct below in the 
representation of
 min/max value pair of coordinates from each axis. Note that X and Y Values are
 always present. Z and M are omitted for 2D geospatial instances.
 
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of invalid values. An invalid geospatial value refers to any of 
+the following: `NaN`, `null`, `does not exist` (e.g., LINESTRING EMPTY), or 
+`out of bounds` (e.g., `x < -180` or `x > 180` for `GEOGRAPHY` types):
+
+* X and Y: Skip any invalid X or Y value and processing the remaining X or Y 
+  values. Do not produce a bounding box if all X or all Y values are invalid.
+
+* Z: Skip any invalid Z value and continue processing the remaining Z values.
+  Omit Z from the bounding box if all Z values are invalid.
+
+* M: Skip any invalid M value and continue processing the remaining M values.
+  Omit M from the bounding box if all M values are invalid.
+
+Readers should follow the guidelines below when examining bounding boxes:
+
+* No bounding box: No assumptions can be made about the presence or absence 
+  of invalid values. Readers may need to load all individual coordinate 
+  values for validation.
+
+* A bounding box is present:
+    * X and Y: X and Y of the bounding box must be present. Readers should 

Review Comment:
   ditto for Z and M below



##########
Geospatial.md:
##########
@@ -94,6 +94,36 @@ Bounding box is defined as the thrift struct below in the 
representation of
 min/max value pair of coordinates from each axis. Note that X and Y Values are
 always present. Z and M are omitted for 2D geospatial instances.
 
+Writers should follow the guidelines below when calculating bounding boxes in
+the presence of invalid values. An invalid geospatial value refers to any of 
+the following: `NaN`, `null`, `does not exist` (e.g., LINESTRING EMPTY), or 

Review Comment:
   It seems worth providing concrete example for each case? For example, I 
still don't understand what does `null` mean here. Is it a null binary value in 
Parquet, or a null value in WKB? We can add a section below named `Invalid 
geospatial value` and link it here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to