Dandandan commented on issue #13620: URL: https://github.com/apache/datafusion/issues/13620#issuecomment-2514220770
AFAIK the algorithm of NLJ is as follows: 1. Generating indices in a certain pattern 2. Taking left / right side based on pattern for columns in filter 3. Creating filter based on expression 4. Filtering indices based on expression 5. Taking values of output columns based on filtered indices Optionally some extra logic per join type is applied (updating visited rows). I think there is a couple of things that might be optimized in nested loop join: * taking values from left side for columns in the filter might be reused between iterations (as this might be the same every time) * ideally: we don't create the indices, take them, create a filter and apply the filters, but create the output indices or even the output directly. This would entail writing a specific implementation which, instead of calling the different kernels combines the individual operations just like @Rachelint did for hash aggregates. Possibly take with iterator could help here as it would avoid implementing this bit over and over for each "fused" implementation. I think we need to see an example of this first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
