andygrove opened a new pull request #7969:
URL: https://github.com/apache/arrow/pull/7969


   This PR modifies the DataFusion Partition trait, changing this method ...
   
   ```rust
   fn execute(&self) -> Result<Arc<Mutex<dyn RecordBatchReader + Send + Sync>>>;
   ```
   
   to
   
   ```rust
   fn execute(&self) -> Result<Arc<dyn RecordBatchReader + Send + Sync>>;
   ```
   
   This is a cleaner API in my opinion and removes the overhead of a mutex lock 
per batch per operator, which is often redundant since many operators do not 
contain mutable state.
   
   This does affect the core arrow and parquet crates as well, removing the 
`&mut self` requirement from `ArrayReader`, for example, in preference of using 
`Rc<RefCell<_>>` with the reader.
   
   I ran the TPC-H query 1 benchmark (scale factor 100) before and after these 
changes and saw no noticeable difference in performance (20.6 seconds).
   
   I think these changes are also a good step towards being able to adopt async 
as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to