ngsg commented on PR #6041: URL: https://github.com/apache/hive/pull/6041#issuecomment-3222432999
@zabetak, thank you for the comment. I also agree that we need to check whether this is an actual improvement or not. I revisited the experiment result and its logs, but I couldn't pinpoint the core reason why the SemiJoin branches were unbeneficial. Possible causes include the logic itself not being effective, stats misestimation leading to incorrect SJ benefit/cost computation, or something else I may have overlooked. Looking closely at the code, the logic and the idea described in HIVE-20775 make sense to me, and the changes in `dynamic_semijoin_reduction_multicol.q.out` [1] seem to demonstrate a case where this logic is useful (though that occurs when `hive.tez.dynamic.semijoin.reduction.multicolumn` is disabled, which is not the general case). I'll add the concern about actual benefit to my personal TODO list and will share results if I find further evidence for a follow-up ticket. [1] https://github.com/apache/hive/pull/6041/files#diff-187118dfcb8b70a73f5682a2011de5c5cf4d3457789cc44b6d7216c188da5b97L332-R362 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org