alamb opened a new issue, #20495:
URL: https://github.com/apache/datafusion/issues/20495

   ### Is your feature request related to a problem or challenge?
   
   I happened to run some profiling tests during the sql parser upgrade
   - https://github.com/apache/datafusion/pull/20177#discussion_r2840964897
   
   My profiles show that almost 13% of the overall sqllogictest runtime is 
consumed by cloning/dropping SessionState.
   
   Basically I ran
   ```shell
   INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests
   ```
   
   And then ran instruments to capture some traces. Here is some obvious 
outliers:
   
   <img width="1871" height="1158" alt="Image" 
src="https://github.com/user-attachments/assets/9026ebcc-73fa-430e-a8a0-ef9153d94f8b";
 />
   <img width="1863" height="1075" alt="Image" 
src="https://github.com/user-attachments/assets/0c531552-f823-4547-8fd5-e3c3019902cc";
 />
   
   Note that most of these queries are quite small (so the actual work to run 
the query is quite small)
   
   However, the fact that so much time is spent copying the SessionState is 
embarassing 
   
   ### Describe the solution you'd like
   
   
   I would like to improve performance by not copying the session state as much
   
   What is going on is that the SessionContext has a mutable SessionState here:
   
   
https://github.com/apache/datafusion/blob/b6f7521752aec92e4a5b014be016fbe185b5bbc1/datafusion/core/src/execution/context/mod.rs#L297-L296
   
   Once a query is planned, the DataFrame gets its own copy of the state so it 
is no longer affected by changes to the SessionState.
   
   Specifically, here:
   
https://github.com/apache/datafusion/blob/b6f7521752aec92e4a5b014be016fbe185b5bbc1/datafusion/core/src/execution/context/mod.rs#L776-L775
   
   The call to `state()` results in a copy: 
   
https://github.com/apache/datafusion/blob/b6f7521752aec92e4a5b014be016fbe185b5bbc1/datafusion/core/src/execution/context/mod.rs#L1860-L1859
   ```rust
       pub fn state(&self) -> SessionState {
           let mut state = self.state.read().clone();
           state.mark_start_execution();
           state
       }
   ```
   
   
   
   ### Describe alternatives you've considered
   
   One idea would be to move from a `Arc<RwLock<SessionState>>` to just an 
`Arc<SessionState>` and use `Arc::make_mut` when it needed to be modified
   
   So instead of 
   ```rust
   #[derive(Clone)]
   pub struct SessionContext {
       /// UUID for the session
       session_id: String,
       /// Session start time
       session_start_time: DateTime<Utc>,
       /// Shared session state for the session
       state: Arc<RwLock<SessionState>>,
   }
   ```
   
   Something like
   ```rust
   So instead of 
   ```rust
   #[derive(Clone)]
   pub struct SessionContext {
       /// UUID for the session
       session_id: String,
       /// Session start time
       session_start_time: DateTime<Utc>,
       /// Shared session state for the session
       state: Arc<SessionState>,  // <---- Note this is now just Arc
   }
   ```
   
   And then instead of `self.state.write()` to modify the state, code could use 
`Arc::make_mut(&mut self.state)`
   
   
https://github.com/apache/datafusion/blob/b6f7521752aec92e4a5b014be016fbe185b5bbc1/datafusion/core/src/execution/context/mod.rs#L1108-L1109
   
   And the idea is that then this Arc<SessionState> could be copied around and 
shared unless it was actually modified
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to