alamb opened a new issue, #20495: URL: https://github.com/apache/datafusion/issues/20495
### Is your feature request related to a problem or challenge? I happened to run some profiling tests during the sql parser upgrade - https://github.com/apache/datafusion/pull/20177#discussion_r2840964897 My profiles show that almost 13% of the overall sqllogictest runtime is consumed by cloning/dropping SessionState. Basically I ran ```shell INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests ``` And then ran instruments to capture some traces. Here is some obvious outliers: <img width="1871" height="1158" alt="Image" src="https://github.com/user-attachments/assets/9026ebcc-73fa-430e-a8a0-ef9153d94f8b" /> <img width="1863" height="1075" alt="Image" src="https://github.com/user-attachments/assets/0c531552-f823-4547-8fd5-e3c3019902cc" /> Note that most of these queries are quite small (so the actual work to run the query is quite small) However, the fact that so much time is spent copying the SessionState is embarassing ### Describe the solution you'd like I would like to improve performance by not copying the session state as much What is going on is that the SessionContext has a mutable SessionState here: https://github.com/apache/datafusion/blob/b6f7521752aec92e4a5b014be016fbe185b5bbc1/datafusion/core/src/execution/context/mod.rs#L297-L296 Once a query is planned, the DataFrame gets its own copy of the state so it is no longer affected by changes to the SessionState. Specifically, here: https://github.com/apache/datafusion/blob/b6f7521752aec92e4a5b014be016fbe185b5bbc1/datafusion/core/src/execution/context/mod.rs#L776-L775 The call to `state()` results in a copy: https://github.com/apache/datafusion/blob/b6f7521752aec92e4a5b014be016fbe185b5bbc1/datafusion/core/src/execution/context/mod.rs#L1860-L1859 ```rust pub fn state(&self) -> SessionState { let mut state = self.state.read().clone(); state.mark_start_execution(); state } ``` ### Describe alternatives you've considered One idea would be to move from a `Arc<RwLock<SessionState>>` to just an `Arc<SessionState>` and use `Arc::make_mut` when it needed to be modified So instead of ```rust #[derive(Clone)] pub struct SessionContext { /// UUID for the session session_id: String, /// Session start time session_start_time: DateTime<Utc>, /// Shared session state for the session state: Arc<RwLock<SessionState>>, } ``` Something like ```rust So instead of ```rust #[derive(Clone)] pub struct SessionContext { /// UUID for the session session_id: String, /// Session start time session_start_time: DateTime<Utc>, /// Shared session state for the session state: Arc<SessionState>, // <---- Note this is now just Arc } ``` And then instead of `self.state.write()` to modify the state, code could use `Arc::make_mut(&mut self.state)` https://github.com/apache/datafusion/blob/b6f7521752aec92e4a5b014be016fbe185b5bbc1/datafusion/core/src/execution/context/mod.rs#L1108-L1109 And the idea is that then this Arc<SessionState> could be copied around and shared unless it was actually modified ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
