paleolimbot commented on issue #46270:
URL: https://github.com/apache/arrow/issues/46270#issuecomment-2914262617
Guaranteeing emptiness is important for two reasons:
- When pruning row groups for a range query along the lines of
`st_intersects(col_name, st_geomfromtext('POINT (0 1)'))`, truly empty column
statistics would indicate a row group that can be pruned; however, statistics
that were not provided by the Parquet metadata (or were provided but were
invalid) would indicate a row group that cannot be pruned.
- After accumulating statistics during writing, `ToThrift()` needs to know
if the bounds for a given dimension are completely empty because it has to
serialize that in a specific way.
> The user would rather some property that guarantees that statistics are
valid, I think.
I think the least confusing interface would be
`GeoStatistics::IntersectsBox(window_xmin, window_ymin, window_xmax,
window_ymax)`. I will take a stab at that and perhaps mark everything else as
internal?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]