Dandandan commented on code in PR #12369: URL: https://github.com/apache/datafusion/pull/12369#discussion_r1750347562
########## datafusion/sqllogictest/test_files/group_by.slt: ########## @@ -2868,18 +2879,24 @@ logical_plan 04)------Projection: s.zip_code, s.country, s.sn, s.ts, s.currency, e.sn, e.amount 05)--------Inner Join: s.currency = e.currency Filter: s.ts >= e.ts 06)----------SubqueryAlias: s -07)------------TableScan: sales_global projection=[zip_code, country, sn, ts, currency] -08)----------SubqueryAlias: e -09)------------TableScan: sales_global projection=[sn, ts, currency, amount] +07)------------Filter: sales_global.currency IS NOT NULL Review Comment: > But in order to skip hashing nulls, the input array would have to be "filtered" (aka copy the matching rows) Correct, but you save some copying in `RepartitionExec` / build side concatenate as well, and copying / checking columns of keys in probe side. In case there aren't any nulls (even if column is nullable), there is no copying happening. Even with CSV / MemTable in many cases null filter can be combined with existing filter expressions, so no extra copying is happening (less copying in fact as fewer rows need to be copied). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org