gabotechs commented on PR #19761:
URL: https://github.com/apache/datafusion/pull/19761#issuecomment-3883127052

   > In my opinion, it would be better to have a more focused solution for this 
problem (right deep join)
   
   This is not exactly the problem this PR tries to solve. Deep nested joins 
just aggravate it, but the problem is still there even in a single hash join.
   
   It boils down to how streams work in Rust. Unlike other languages, streams 
in Rust do nothing unless polled. This means that not polling the probe side 
early is a missed opportunity for making progress while the build side is being 
built. In real world scenario, where queries might not be CPU bounded, this 
gets worst.
   
   > would a better solution be to make the parallelism more explicit?
   
   The explicitness is attempted to be provided as a new node in the plan 
(BufferExec) that explicitly models and measures data buffering.
   
   > Some way to explicitly parallelize just the lookup table sides (left sides)
   
   This could show some speedups, although there are still opportunities to 
make progress on the probe side while the build sides are being built. Note 
that speeds would also be visible if there is a single hash join, as we can 
build both sides in parallel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to