comphead commented on issue #13620: URL: https://github.com/apache/datafusion/issues/13620#issuecomment-2515052298
> Optionally some extra logic per join type is applied (updating visited rows). > > I think there is a couple of things that might be optimized in nested loop join: > > * taking values from left side for columns in the filter might be reused between iterations (as this might be the same every time) > * ideally: we don't create the indices, take them, create a filter and apply the filters, but create the output indices or even the output directly. This would entail writing a specific implementation which, instead of calling the different kernels combines the individual operations just like @Rachelint did for hash aggregates. Possibly take with iterator could help here as it would avoid implementing this bit over and over for each "fused" implementation. I think we need to see an example of this first. That makes sense, alternatively there is a good set of NLJ optimizations in this video https://www.youtube.com/watch?v=RcEW0P8iVTc I went quickly through our NLJ implementation and it looks like a page(batch) based NLJ, and there is some optimizations that can potentially be applied reg to video. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
