mkaravel commented on PR #2971: URL: https://github.com/apache/parquet-java/pull/2971#issuecomment-2764619708
@zhangfengcdt Thank you for addressing the comments! Regarding the bounding boxes, I believe I did not make myself clear enough as to what I was suggesting. I think that using +/- infinity as initialization is a great idea and it helps with the code logic and can simplify it. If I understood the logic correctly, what was strange for me was exposing bounding boxes with +/- infinity values (which in the previous implementation could happen if you had a file with just empty geometries (and possibly NULLs). Roughly speaking this is code flow I had in mind when computing boxes: * To compute the bounding box for a row group initialize this box to `(inf, -inf, inf, -inf)`. * While looking at rows: If the geometry is empty skip it. If the geometry is not empty then merge its bounding box with the current bounding box for the group. This is where the initialization with infinite values is handy **because you do not need to check for infinity explicitly**. For example if the current bbox for the row group is `(inf, -inf, inf, -inf)` and the bbox of the current geometry is `(1, 2, 3, 4)`, you can simply do (I am writing conceptual code here) `rowGroup.xMin = Math.Min(rowGroup.xMin, geo.XMin)`, `rowGroup.xMax = Math.Max(rowGroup.xMax, geo.XMax)` and similarly for the other dimension(s). These statements (and this was my point) work if the initial coordinates of the row group bbox are +/- infinity (because this is how IEEE 754 floating point arithmetic works). * If at the end of processing a row group you still see a bbox of `(inf, -inf, inf, -inf)` then expose it as `(NaN, NaN, NaN, NaN)`. This means you have not found any non-empty geometry... I will go over the code again, probably tomorrow and make specific suggestions, if needed, regarding what I am referring to above. @paleolimbot Could you please share a link to the C++ implementation? Would be very interested in taking a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
