[GitHub] [arrow-datafusion] Dandandan commented on issue #523: Number of output record batches for small datasets is large

GitBox Mon, 07 Jun 2021 23:21:40 -0700


Dandandan commented on issue #523:
URL: 
https://github.com/apache/arrow-datafusion/issues/523#issuecomment-856483448



   > I agree that Python just should merge them on the tests. I was a bit 
surprised that even in such a low number of entries we are splitting them: 
seems odd to me.
   
   There could be some heuristics / optimizations to not apply partitioning for 
small datasets (when known upfront). For example, with hash join that can be 
beneficial when the left side is very small compared to the right side (hash 
partitioning the right side in that case could be slower than building the left 
side in a single thread / worker).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on issue #523: Number of output record batches for small datasets is large

Reply via email to