jiayuasu opened a new issue, #2939:
URL: https://github.com/apache/sedona/issues/2939

   Follow-up to the Phase 1 + Wave 1 Box2D work (#2877, #2925, #2926).
   
   ## Scope
   
   Extend Sedona's spatial join detection so a `JOIN ... ON 
ST_BoxIntersects(a.bbox, b.bbox)` (or `ST_BoxContains`) gets routed through the 
partitioned spatial join (broadcast index join, range join — whichever the 
existing optimizer picks for `ST_Intersects` on geometry columns).
   
   Today these predicates work as scalar filters but do not trigger any 
partitioning / index-based optimization on join, so two large bbox-bearing 
tables joined on `ST_BoxIntersects` would degrade to an O(N×M) cross product. 
The Box2D type was meant to make these joins cheaper, so this issue is the 
missing planner half.
   
   ## Why this matters
   
   Pre-computed bbox columns are a common pattern: extract a bbox once, then 
repeatedly join multiple datasets against it. Each join should:
   
   1. Skip geometry deserialization on both sides (Box2D = 4 doubles, no JTS 
round-trip).
   2. Use the existing R-tree / partitioner machinery — it already operates on 
bboxes internally; the work is at the predicate-recognition layer, not the 
index layer.
   
   ## Implementation outline
   
   - Find the rule that recognizes spatial join predicates today (likely 
`JoinQueryDetector` or a similar Catalyst rule) and add `ST_BoxIntersects` / 
`ST_BoxContains` to the recognized set.
   - Adapt the input plumbing so the join physical operator can extract `Box2D` 
envelopes directly without a Geometry deserialization step.
   - For `ST_BoxContains` joins, treat as the asymmetric-containment variant of 
the existing range join (matches the semantics of `ST_Contains(geom, geom)` 
join detection).
   - Mixed Box2D / Geometry join predicates wait on the implicit cast from 
#2927 or explicit overloads.
   
   ## Tests
   
   - Two `Box2D` columns joined with `ST_BoxIntersects` produces the correct 
result and uses a partitioned plan (verify via `explain()` not falling back to 
BroadcastNestedLoopJoin / SortMergeJoin without an inequality condition).
   - Same for `ST_BoxContains`.
   - Compare runtime against the equivalent `ST_Intersects(geom_a, geom_b)` 
join on the same data — should be at least as fast (typically faster because no 
geometry deserialization).
   
   ## Depends on
   
   - This issue. (Standalone — works on top of the existing Phase 1 + Wave 1 
Box2D surface.)
   
   ## Out of scope
   
   - A specialized R-tree index keyed by `Box2D` (skip the JTS Envelope 
round-trip in the index itself). Tracked separately as a perf follow-up — only 
worth doing if profiling shows the round-trip is hot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to