andygrove opened a new issue #63:
URL: https://github.com/apache/arrow-datafusion/issues/63


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   The main issue limiting scalability in Ballista today is that joins are 
implemented as hash joins where each partition of the probe side causes the 
entire left side to be loaded into memory.
   
   
   **Describe the solution you'd like**
   
   To make this scalable we need to hash partition left and right inputs so 
that we can join the left and right partitions in parallel.
   
   There is already work underway in DataFusion to implement this that we can 
leverage.
   
   **Describe alternatives you've considered**
   None
   
   **Additional context**
   None
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to