2010YOUY01 commented on PR #16996: URL: https://github.com/apache/datafusion/pull/16996#issuecomment-3173870087
> > I really appreciate your detailed review. I have addressed them in [4c111c3](https://github.com/apache/datafusion/pull/16996/commits/4c111c38e6b05553d78bf9c46346f4a47cb7e710) > > BTW, I take code understandability very seriously, and I’m happy to make any small changes that improve it. I encourage others to do similar reviews -- point out anything that doesn’t make sense and to share even small nitpicks. > > > Thanks @2010YOUY01 I'll continue later, I'm out of my mental capacity :) Appreciate for the comments it was much easier to navigate. > > > Good sign we got fuzz testing working with the new implementation. For the post filtering it would be probably possible to split filter stage evalulation for certain types of join. Inner, outer joins filter can be evaluated early whereas SEMI, ANTI on late stage like now. > > > > > > I don't get this changing filter evaluation time idea, could you elaborate? > > As far as I understood the filter evaluation happens in post phase, after tuples joined by key and this makes a lot of sense especially for SEMI, ANTI joins where you have to track filter eval results for the same key across all batches. However for Inner join and outer joins it maybe beneficial to evaluate record batches early and avoid join calculation and save some ticks > > For example > > ``` > > with t as (select 1 a, 2 b), t1 as (select 2 a, 3 b) select * from t left join t1 on (t.a < t1.b and t1.b = 4); > +---+---+------+------+ > | a | b | a | b | > +---+---+------+------+ > | 1 | 2 | NULL | NULL | > +---+---+------+------+ > ``` > > So the right record batch you can evaluate filter `t1.b = 4` and figure out that indices in the batch doesn't fit to join condition even without having actual join I see, but this step should be handled below the join operator by filter pushdown: See the `FilterExec` in the left input in the below example ``` > explain SELECT * FROM range(10000) AS t1 JOIN range(10000) AS t2 ON ((t1.value + t2.value) % 1000 = 0) AND (t1.value > 5000); +---------------+------------------------------------------------------------+ | plan_type | plan | +---------------+------------------------------------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ NestedLoopJoinExec ├──────────────┐ | | | └─────────────┬─────────────┘ │ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ CoalescePartitionsExec ││ RepartitionExec │ | | | │ ││ -------------------- │ | | | │ ││ partition_count(in->out): │ | | | │ ││ 1 -> 14 │ | | | │ ││ │ | | | │ ││ partitioning_scheme: │ | | | │ ││ RoundRobinBatch(14) │ | | | └─────────────┬─────────────┘└─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ CoalesceBatchesExec ││ LazyMemoryExec │ | | | │ -------------------- ││ -------------------- │ | | | │ target_batch_size: ││ batch_generators: │ | | | │ 8192 ││ range: start=0, end=10000,│ | | | │ ││ batch_size=8192 │ | | | └─────────────┬─────────────┘└───────────────────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ FilterExec │ | | | │ -------------------- │ | | | │ predicate: │ | | | │ value > 5000 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ RepartitionExec │ | | | │ -------------------- │ | | | │ partition_count(in->out): │ | | | │ 1 -> 14 │ | | | │ │ | | | │ partitioning_scheme: │ | | | │ RoundRobinBatch(14) │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ LazyMemoryExec │ | | | │ -------------------- │ | | | │ batch_generators: │ | | | │ range: start=0, end=10000,│ | | | │ batch_size=8192 │ | | | └───────────────────────────┘ | | | | +---------------+------------------------------------------------------------+ ``` Perhaps it's better to rename the variable name from `filter` to `join_predicate` to make it more clear. I'm not sure if such filter pushdowns would be incorrect for SEMI/ANTI joins, I'll double check that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org