mkaravel commented on PR #2971:
URL: https://github.com/apache/parquet-java/pull/2971#issuecomment-2780822975

   > I think your strategy is a good one for JTS, but I also think it's OK to 
do anything that won't result in accidentally excluding the entire row roup 
(i.e., a writer MAY choose to either include or exclude finite coordinates from 
geometries that contain nan values when writing statistics, or non-points that 
contain NaN values have undefined behaviour but shouldn't affect valid 
geometries in the same row group).
   
   If you include coordinates values for geometries that contain 
unexpected/invalid NaN coordinates the bounding boxes can only get bigger. 
Although it would depend on the engine, I would expect such a situation to not 
affect query results for valid geometries. In general, if geometries with 
invalid coordinates are in the data the behavior should really be considered 
undefined from the query engine's perspective, and to be honest whatever this 
implementation does is okay as long as:
   * It does not expose these NaN values in the output bounding box at the 
storage level.
   * It does not skip valid geometries in the same group (which was one of your 
comments which I totally agree with).
   
   I think what I propose is a simple modification of the existing 
implementation and satisfies these requirements.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to