Akanksha-kedia opened a new pull request, #18829:
URL: https://github.com/apache/pinot/pull/18829

   ## Summary
   
   Adds a `joinOptions(join_strategy='broadcast_right')` query hint to the 
multi-stage query engine (MSE). When set, the entire right-side table is 
broadcast to every join worker while the left side is hash/random-distributed. 
This eliminates the right-side network shuffle for star-schema patterns where 
the right table is small enough to fit in memory but is **not** pre-replicated 
as a dimension table.
   
   ```sql
   SELECT /*+ joinOptions(join_strategy='broadcast_right') */
       o.order_id, p.product_name
   FROM orders o
   JOIN products p ON o.product_id = p.id
   ```
   
   ### Why this is useful
   
   | Strategy | When to use |
   |---|---|
   | `lookup` | Right table is a dimension table replicated to all workers |
   | `broadcast_right` (new) | Right table is small but **not** replicated — 
avoids right-side shuffle |
   | default hash | General case |
   
   ## Changes
   
   | File | Change |
   |---|---|
   | `PinotHintOptions` | Add `BROADCAST_RIGHT_JOIN_STRATEGY` constant + 
`useBroadcastRightJoinStrategy()` helper |
   | `PinotJoinExchangeNodeInsertRule` (V1 planner) | Detect hint → BROADCAST 
exchange on right, hash/random on left |
   | `TraitAssignment` (V2 planner) | Detect hint → assign 
`BROADCAST_DISTRIBUTED` trait to right input |
   | `RelToPlanNodeConverter` (V1) | Set `JoinStrategy.BROADCAST_RIGHT` on the 
`JoinNode` |
   | `PRelToPlanNodeConverter` (V2) | Same |
   | `JoinNode` | Add `BROADCAST_RIGHT` to `JoinStrategy` enum |
   | `plan.proto` | Add `BROADCAST_RIGHT = 3` to `JoinStrategy` proto enum |
   | `PlanNodeSerializer` / `PlanNodeDeserializer` | Round-trip 
`BROADCAST_RIGHT` across broker↔server boundary |
   | `DefaultJoinOperatorFactory` | Execute `BROADCAST_RIGHT` as 
`HashJoinOperator` (distribution handled at plan time) |
   | `InStageStatsTreeBuilder` | Replace fragile `assert LOOKUP` with explicit 
switch — handles `BROADCAST_RIGHT` and `ASOF` cleanly |
   
   ## Why RIGHT/FULL OUTER joins are blocked
   
   `HashJoinOperator` tracks unmatched right rows **locally per worker**. 
Broadcasting the right table to N workers means each worker independently emits 
null-extended rows for the same unmatched right row → N duplicate rows in the 
output. The planner rejects `RIGHT`/`FULL OUTER` with a clear error message at 
query compile time.
   
   ## Tests
   
   - `testBroadcastRightJoinHintEquiJoin` — equi-join: verifies BROADCAST 
distribution on right, HASH on left, `BROADCAST_RIGHT` strategy on the 
`JoinNode`
   - `testBroadcastRightJoinHintNonEquiJoin` — non-equi-join: verifies 
BROADCAST on right, RANDOM on left
   
   `mvn test -pl pinot-query-planner -Dtest=QueryCompilationTest` — **201 
tests, 0 failures**
   
   Closes #14518


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to