tustvold opened a new issue, #2199:
URL: https://github.com/apache/arrow-datafusion/issues/2199

   A proposal for reformulating the parallelism story within DataFusion to use 
a [morsel-driven](https://db.in.tum.de/~leis/papers/morsels.pdf) approach based 
on [rayon](https://docs.rs/rayon/latest/rayon/). More details, background, and 
discussion can be found in the proposal document 
[here](https://docs.google.com/document/d/1txX60thXn1tQO1ENNT8rwfU3cXLofa7ZccnvP4jD6AA),
 please feel free to comment there.
   
   The keys highlights are:
   
   * Decouples parallelism from the partitioning expressed in the physical 
plan, allowing for:
     * Better handling of imbalanced partitions
     * Adaptive parallelism based on compute availability at execution time
     * Parallelism within a partition, such as decoding parquet columns in 
parallel, [parallel 
sort](https://docs.rs/rayon/latest/rayon/slice/trait.ParallelSliceMut.html), 
etc...
   * The first step to reducing the complexity associated with the current 
futures-based concurrency model
   * Improvements to thread-locality, observability and performance
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to