geoffreyclaude commented on issue #22708: URL: https://github.com/apache/datafusion/issues/22708#issuecomment-4600294872
Thanks @gabotechs, I agree with both points. For the first point, the `EvaluationType::Eager` rustdoc was defining the category too tightly by saying evaluation starts on the first `Stream::poll_next`. I think that startup timing is an implementation detail of eager operators. `BufferExec` starts its producer from `execute`, while other eager operators may start producer work on first poll; both are still eager from the caller's perspective because downstream demand can cause the operator to drive child input or produce batches independently of one downstream poll at a time. For `need_data_exchange`, I agree that using `evaluation_type == Eager` makes the helper answer the wrong question once `BufferExec` and `AnalyzeExec` are classified accurately. The history is useful here: #4585 proposed `need_data_exchange` for callers that need to identify physical operators requiring exchange-style handling, and listed non-round-robin `RepartitionExec`, `CoalescePartitionsExec`, and `SortPreservingMergeExec`. #4586 implemented that helper around that exchange/gather meaning; it even moved the logic out of the `ExecutionPlan` trait into a free helper. Later, #16398 introduced `EvaluationType` for cooperative scheduling and made `need_data_exchange` delegate to eager evaluation. That was understandable while the eager set effectively matched the exchange/gather set, but it conflates two different properties. I updated the PR so `EvaluationType` is documented as execution/evaluation behavior, while `need_data_exchange(...)` is restored to the exchange/gather predicate. That lets `BufferExec` and `AnalyzeExec` report eager evaluation without being treated as data-exchange operators. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
