yashmayya commented on PR #18658: URL: https://github.com/apache/pinot/pull/18658#issuecomment-4651923268
> Does it mean that pipeline breakers may not be applied? Good catch — yes, for those specific shapes the dynamic-broadcast (pipeline-breaker) path is not applied, and it isn't something we can easily toggle. **Mechanism.** `PinotJoinToDynamicBroadcastRule` — the rule that inserts the `PIPELINE_BREAKER` exchange on the build side — only matches `JoinRelType.SEMI` (it returns `false` for anything else; there's a standing `TODO #1` in that rule to also handle INNER once the leaf stage supports a regular join operator). So when the subquery comes out as an INNER join instead of a SEMI join, the rule no longer fires: the build side becomes a plain `hash` exchange feeding an inner join, with a distinct `Aggregate` added on the build side. In the snapshot diff you can see the build-side exchange go from `broadcast` + `PIPELINE_BREAKER` to `hash[0]`. **It's narrow.** This only affects uncorrelated `col IN (SELECT col FROM ...)` subqueries that Calcite's 1.41 sub-query-removal / decorrelation rework now lowers as `Aggregate`(distinct on the key) + INNER join instead of a SEMI join (once the build side is distinct the two are result-equivalent — verified against H2). In the regenerated snapshots that's only the two `SELECT COUNT(*) ... WHERE ... IN (SELECT ...) AND ... IN (SELECT ...)` multi-predicate cases in `JoinPlans.json`. The everyday single `IN (SELECT ...)`, `WHERE EXISTS`, and explicit semi-join shapes still come out as SEMI and keep the dynamic broadcast — e.g. the nested `col2 IN (SELECT col1 FROM tmp1)` in that same query stays SEMI. **Why it changed / controllability.** The SEMI vs. (distinct-`Aggregate` + INNER) choice is made inside Calcite's `SubQueryRemoveRule` / decorrelation, which was reworked in 1.41 — it's not behind a Pinot flag, so we can't cleanly force it back to a semi-join from our side. The proper fix is the existing `PinotJoinToDynamicBroadcastRule` `TODO #1` (let it also match the INNER + distinct-build shape), but that depends on leaf-stage INNER-join support and is its own change, out of scope for this version bump. Net: results stay correct, but these specific shapes lose the dynamic-broadcast pipeline breaker until we wire that up. Happy to file a follow-up issue to track it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
