tustvold opened a new issue, #4617:
URL: https://github.com/apache/arrow-datafusion/issues/4617

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Broadly speaking:
   
   * `SessionContext` / `SessionState` - state used to plan a query
   * `ExecutionProps` - state used to lower a logical expression to a physical 
expression
   * `TaskContext` - state used to execute a query
   
   We then have the following 
   
   * `RuntimeEnv` - "global" configuration available at plan and query time
   * `SessionConfig` - session configuration available at plan and query time
   
   Of these `RuntimeEnv`, `SessionState` and `SessionConfig` are interior 
mutable, that is they can be modified without a mutable reference.
   
   The result is that queries can and do modify the session and runtime 
configuration during execution. This is important to support things like 
`CREATE TABLE`, `SET`, etc... This is fine, however, the use of shared mutable 
state means that modifications will also impact in-flight queries. This feels 
at best surprising, and there is a fairly high probability of their being 
consistency bugs already resulting from this. 
   
   **Describe the solution you'd like**
   
   I would ideally like to use Rust's borrow checker to handle this for us, as 
this would not only eliminate a non-trivial amount of locking complexity from 
the DataFusion codebase, but would also more clearly communicate what state can 
be altered when.
   
   This would require separating DDL from DML, with the latter requiring 
mutable access to the `SessionContext`. I'm inclined to think this is fine for 
a couple of reasons:
   
   * Some of the methods on `SessionContext` still take `&mut self` - #4612
   * Most use-cases aren't using `SessionContext` in parallel
   * Those that are using `SessionContext` in parallel will need async state 
management regardless
   
   It isn't a fully formed thought, but something that came out of #4607 is the 
need to be able to pre-parse a SQL statement. Perhaps we could provide some 
sort of `SqlStatement` wrapper containing a parsed `SQL` statement. This would 
facilitate delegation of specific handling of mutating queries to the 
downstream system, which is far better placed to determine the desired 
semantics.
   
   **Describe alternatives you've considered**
   
   **Additional context**
   
   #4517 #3887 #4349 track improvements to DataFusion's configuration
   
   #3777 tracks async catalog support which introduces another dimension to the 
out-of-band state modification


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to