avantgardnerio commented on PR #23167:
URL: https://github.com/apache/datafusion/pull/23167#issuecomment-4803348695

   And as promised, answers to individual questions:
   
   > Why not extend the existing partition_statistics method instead of adding 
runtime_row_count?
   I think we should do that. What I have was convenient for fall through to 
children.
     
   > Feels like a parallel optimization framework. There shouldn't be a 
difference between pre-execution and in-flight optimization.
   
    That's exactly the target, and it's why RuntimeRule has the same signature 
as PhysicalOptimizerRule now. The two traits exist today only because 
physical-plan can't depend on  physical-optimizer (the dependency runs the 
other way) . End-state: one trait, with rules optionally reading runtime stats 
from boundaries in the plan tree they receive. Static rules ignore runtime 
state; runtime-aware rules use it. Per-rule migration: any existing 
PhysicalOptimizerRule (including JoinSelection) can be made runtime-aware by 
adding state inspection. Existing rules don't change until they want to.
     
   > Pipeline breakers for HJ and NLJ?
   
   For HJ this PR inserts one: InsertHashJoinBoundaries wraps each HashJoin 
input in a StageBoundaryBuffer. The buffer materializes its input and IS the 
pipeline breaker — we don't need a natural one inside the join.
   
     - Same pattern works for NLJ (no natural breaker, just insert one above 
each input — ~30-line rule).
     - For SMJ either approach works: reuse the existing SortExec (it's already 
a breaker — insert the boundary above it for stats) or insert directly above 
the SMJ inputs. Either way the infrastructure is the same.
     - The contribution is precisely this generalization: any operator can 
become adaptive without depending on a natural breaker existing inside it.
     
   > Joins usually don't want to fully know both sides; when does non-shuffle 
execution benefit?
   
     - The win case is exactly what the SLT shows: one side is derived 
(aggregation, filter, scan-with-projection) whose post-operator size is much 
smaller than the planner's Inexact upper bound. Materializing that side is 
cheap relative to the bad build-side choice it lets us avoid.
     - For sides where materialization is genuinely unsafe (large streaming 
inputs), the stream-through fallback (point 5 of the summary) gates: if memory 
pressure hits the threshold, the boundary releases without finishing 
materialization, the runtime rule sees an unflagged stage and bails, the query 
runs exactly as it would today.
     - Net: adaptive wins when stats are knowable cheaply; otherwise we're no 
worse than the pre-AQE baseline.
     
   > Reuse existing optimizations like JoinSelection.
     - Strongly agree. SwapBuildSideIfInverted already calls 
HashJoinExec::swap_inputs — the same primitive JoinSelection uses. Same logic, 
fresher numbers.
     - With the trait unification above, JoinSelection itself could run at 
runtime against the post-breaker plan, no re-implementation needed.
     
   > RuntimeOptimizerExec feels hacky — reusing an ExecutionPlan for AQE 
plumbing.
   
     - Fair concern. The reason it IS an ExecutionPlan: that's the only point 
in the existing pipeline where we can intercept poll-time decisions without 
changes to every operator or to the session layer. The RTO sits at the root, 
opt-in via an optimizer rule.
     - The wrapper-at-root pattern matches other "coordinator" nodes in DF 
(e.g. how CoalescePartitionsExec orchestrates downstream). The operator 
interface is the natural composition surface.
     - Open to alternatives — a SessionContext-level coordinator would also 
work but requires plumbing into every execute path. The wrapper was the 
surgical "get it done today, show working code" compromise of this PR.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to