andygrove opened a new pull request #7969: URL: https://github.com/apache/arrow/pull/7969
This PR modifies the DataFusion Partition trait, changing this method ... ```rust fn execute(&self) -> Result<Arc<Mutex<dyn RecordBatchReader + Send + Sync>>>; ``` to ```rust fn execute(&self) -> Result<Arc<dyn RecordBatchReader + Send + Sync>>; ``` This is a cleaner API in my opinion and removes the overhead of a mutex lock per batch per operator, which is often redundant since many operators do not contain mutable state. This does affect the core arrow and parquet crates as well, removing the `&mut self` requirement from `ArrayReader`, for example, in preference of using `Rc<RefCell<_>>` with the reader. I ran the TPC-H query 1 benchmark (scale factor 100) before and after these changes and saw no noticeable difference in performance (20.6 seconds). I think these changes are also a good step towards being able to adopt async as well. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
