jiayuasu opened a new issue, #2938:
URL: https://github.com/apache/sedona/issues/2938

   Follow-up to the Phase 1 + Wave 1 Box2D work (#2877, #2925, #2926).
   
   ## Scope
   
   Teach the GeoParquet predicate-pushdown machinery to recognize 
`ST_BoxIntersects(box_col, box_lit)` and `ST_BoxContains(box_col, box_lit)` and 
translate them into the same row-group / partition pruning path that already 
handles `ST_Intersects(geom_col, geom_lit)` against bbox covering columns.
   
   This is the highest-leverage piece of Phase 3 because it works on **existing 
GeoParquet 1.1 files** (which already carry bbox covering columns) without any 
other planner change. Users who pre-compute a `Box2D` column or read 
covering-column Parquet files get pruning for free.
   
   ## Implementation outline
   
   - Extend `SpatialFilterPushDownForGeoParquet` (or its modern equivalent) to 
recognize the new predicates with a literal Box2D RHS.
   - Convert the recognized predicate into the existing 
`GeoParquetSpatialFilter` shape — same `xmin/ymin/xmax/ymax` ranges that the 
geometry path produces, just sourced from the literal Box2D directly.
   - For `ST_BoxContains(box_col, box_lit)` the pruning is symmetric to 
intersection but tighter — covering cells that are not fully contained can be 
pruned for join keys but not for filters, so we likely should pushdown only 
`ST_BoxIntersects` initially and document `ST_BoxContains` as a non-pushdown 
predicate. (Alternative: pushdown both as conservative `ST_BoxIntersects` 
filters; refine in-memory.)
   
   ## Tests
   
   - DataFrame `WHERE ST_BoxIntersects(bbox_col, lit(some_box))` reads only the 
row groups whose bbox metadata overlaps the literal.
   - Same query against a file with no bbox covering metadata falls back 
cleanly (no pruning, but correct results).
   - NULL bbox literal short-circuits to no rows (or all rows, consistent with 
the existing geometry-side behavior).
   
   ## Pairs naturally with
   
   - **Reader auto-materialization** of GeoParquet bbox covering columns as 
`Box2D` (deferred from #2886). That makes `WHERE ST_BoxIntersects(box_col, 
lit(b))` the canonical way to express bbox-pruned reads — the typed column 
comes from disk, the predicate prunes the disk read. Worth scoping these 
together.
   
   ## Out of scope
   
   - Two-sided pushdown (`box_col_a` vs `box_col_b`) — that's the spatial-join 
planner work tracked separately.
   - Pushdown for `ST_BoxIntersects(geom_col, lit(box))` / mixed inputs — 
depends on the implicit cast from #2927 or explicit mixed overloads; revisit 
after.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to