tustvold opened a new issue, #4617: URL: https://github.com/apache/arrow-datafusion/issues/4617
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Broadly speaking: * `SessionContext` / `SessionState` - state used to plan a query * `ExecutionProps` - state used to lower a logical expression to a physical expression * `TaskContext` - state used to execute a query We then have the following * `RuntimeEnv` - "global" configuration available at plan and query time * `SessionConfig` - session configuration available at plan and query time Of these `RuntimeEnv`, `SessionState` and `SessionConfig` are interior mutable, that is they can be modified without a mutable reference. The result is that queries can and do modify the session and runtime configuration during execution. This is important to support things like `CREATE TABLE`, `SET`, etc... This is fine, however, the use of shared mutable state means that modifications will also impact in-flight queries. This feels at best surprising, and there is a fairly high probability of their being consistency bugs already resulting from this. **Describe the solution you'd like** I would ideally like to use Rust's borrow checker to handle this for us, as this would not only eliminate a non-trivial amount of locking complexity from the DataFusion codebase, but would also more clearly communicate what state can be altered when. This would require separating DDL from DML, with the latter requiring mutable access to the `SessionContext`. I'm inclined to think this is fine for a couple of reasons: * Some of the methods on `SessionContext` still take `&mut self` - #4612 * Most use-cases aren't using `SessionContext` in parallel * Those that are using `SessionContext` in parallel will need async state management regardless It isn't a fully formed thought, but something that came out of #4607 is the need to be able to pre-parse a SQL statement. Perhaps we could provide some sort of `SqlStatement` wrapper containing a parsed `SQL` statement. This would facilitate delegation of specific handling of mutating queries to the downstream system, which is far better placed to determine the desired semantics. **Describe alternatives you've considered** **Additional context** #4517 #3887 #4349 track improvements to DataFusion's configuration #3777 tracks async catalog support which introduces another dimension to the out-of-band state modification -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
