Dandandan commented on code in PR #21351:
URL: https://github.com/apache/datafusion/pull/21351#discussion_r3086222712
##########
datafusion/datasource/src/source.rs:
##########
@@ -471,6 +549,12 @@ impl ExecutionPlan for DataSourceExec {
as Arc<dyn ExecutionPlan>
})
}
+
+ fn reset_state(self: Arc<Self>) -> Result<Arc<dyn ExecutionPlan>> {
Review Comment:
This doesn't hold back this PR, but I don't like the additions in some
recent PRs of `reset_state` on plans / sources etc.
In my opinion we should avoid setting mutable state on `ExecutionPlan`s (and
related data structures) & move that to the inner streams.
In an ideal scenario:
* During any time we can `clone` a plan and share / run it without expecting
any side effects (e.g. partially executed plans)
* Concurrent runs of a query (ideally`execute`s) should result in the same
result, even while a query is running.
In my PR https://github.com/apache/datafusion/pull/20481 I combined the
shared state using `TaskContext` (which, in normal usages, will be created on
each new query run), which I think still not is 100% the desired end state but
avoids a lot of scenario's.
https://github.com/apache/datafusion/pull/20481/files#diff-e3975f93f1d4599fe6f19905e81f02574d65402e604e39c45b87448b3892d1adR70
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]