tustvold opened a new issue, #2199: URL: https://github.com/apache/arrow-datafusion/issues/2199
A proposal for reformulating the parallelism story within DataFusion to use a [morsel-driven](https://db.in.tum.de/~leis/papers/morsels.pdf) approach based on [rayon](https://docs.rs/rayon/latest/rayon/). More details, background, and discussion can be found in the proposal document [here](https://docs.google.com/document/d/1txX60thXn1tQO1ENNT8rwfU3cXLofa7ZccnvP4jD6AA), please feel free to comment there. The keys highlights are: * Decouples parallelism from the partitioning expressed in the physical plan, allowing for: * Better handling of imbalanced partitions * Adaptive parallelism based on compute availability at execution time * Parallelism within a partition, such as decoding parquet columns in parallel, [parallel sort](https://docs.rs/rayon/latest/rayon/slice/trait.ParallelSliceMut.html), etc... * The first step to reducing the complexity associated with the current futures-based concurrency model * Improvements to thread-locality, observability and performance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
