Re: [I] Potentially improve join performance by implementing a version of the take kernel that accepts an iterator of indices [datafusion]

via GitHub Tue, 03 Dec 2024 08:37:55 -0800


comphead commented on issue #13620:
URL: https://github.com/apache/datafusion/issues/13620#issuecomment-2515052298


   
   > Optionally some extra logic per join type is applied (updating visited 
rows).
   > 
   > I think there is a couple of things that might be optimized in nested loop 
join:
   > 
   > * taking values from left side for columns in the filter might be reused 
between iterations (as this might be the same every time)
   > * ideally: we don't create the indices, take them, create a filter and 
apply the filters, but create the output indices or even the output directly. 
This would entail writing a specific implementation which, instead of calling 
the different kernels combines the individual operations just like @Rachelint 
did for hash aggregates. Possibly take with iterator could help here as it 
would avoid implementing this bit over and over for each "fused" 
implementation. I think we need to see an example of this first.
   
   That makes sense, alternatively there is a good set of NLJ optimizations in 
this video https://www.youtube.com/watch?v=RcEW0P8iVTc
   
   I went quickly through our NLJ implementation and it looks like a 
page(batch) based NLJ, and there is some optimizations that can potentially be 
applied reg to video.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Potentially improve join performance by implementing a version of the take kernel that accepts an iterator of indices [datafusion]

Reply via email to