geoffreyclaude opened a new issue, #22708: URL: https://github.com/apache/datafusion/issues/22708
## Summary `ExecutionPlan` metadata currently describes `EvaluationType::Eager` as an operator stream that eagerly generates `RecordBatch` values in one or more spawned Tokio tasks. `BufferExec` and `AnalyzeExec` both appear to match that behavior, but neither reports eager evaluation in its `PlanProperties`. This makes `EvaluationType` less reliable for optimizers or integrations that need to reason about whether an operator drives child streams from background tasks. ## Current behavior On current `main`: - `BufferExec::new` clones the input properties and only changes `SchedulingType` to `Cooperative`. - `BufferExec::execute` wraps the input in `MemoryBufferedStream::new(...)`. - `MemoryBufferedStream::new(...)` immediately creates a `SpawnedTask` that polls the input stream into an internal queue. That behavior looks eager, but the plan retains the input evaluation type. Similarly: - `AnalyzeExec::compute_properties(...)` constructs `PlanProperties::new(...)` and leaves `evaluation_type` at the default `Lazy`. - `AnalyzeExec::execute` creates a `RecordBatchReceiverStream::builder(...)` and calls `builder.run_input(...)` for each input partition. - The comments describe those futures as running input partitions in parallel on separate Tokio tasks. That also looks eager, but the plan reports lazy evaluation. ## Expected behavior If the documented contract for `EvaluationType::Eager` is intended to mean that an operator drives child stream polling in spawned Tokio tasks, then `BufferExec` and `AnalyzeExec` should set `PlanProperties::with_evaluation_type(EvaluationType::Eager)`. `BufferExec` should probably always be eager because it creates the background buffering task for its input stream. `AnalyzeExec` should probably be eager when it runs input partitions through `RecordBatchReceiverStream::builder(...).run_input(...)`, similar to other operators that drive input partitions from spawned tasks. ## Why this matters DataFusion already exposes `need_data_exchange(plan)` as a helper that checks: ```rust plan.properties().evaluation_type == EvaluationType::Eager ``` So stale or incomplete `EvaluationType` metadata can make physical-plan analysis miss operators that actually create independent child-polling pipelines. ## Version Observed on Apache DataFusion `main` on June 2, 2026. ## Possible fix Set `EvaluationType::Eager` in the `PlanProperties` for these operators, with focused tests asserting their reported evaluation type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
