ngsg commented on PR #6041:
URL: https://github.com/apache/hive/pull/6041#issuecomment-3222432999

   @zabetak, thank you for the comment. I also agree that we need to check 
whether this is an actual improvement or not. I revisited the experiment result 
and its logs, but I couldn't pinpoint the core reason why the SemiJoin branches 
were unbeneficial. Possible causes include the logic itself not being 
effective, stats misestimation leading to incorrect SJ benefit/cost 
computation, or something else I may have overlooked.
   
   Looking closely at the code, the logic and the idea described in HIVE-20775 
make sense to me, and the changes in 
`dynamic_semijoin_reduction_multicol.q.out` [1] seem to demonstrate a case 
where this logic is useful (though that occurs when 
`hive.tez.dynamic.semijoin.reduction.multicolumn` is disabled, which is not the 
general case).
   
   I'll add the concern about actual benefit to my personal TODO list and will 
share results if I find further evidence for a follow-up ticket.
   
   [1] 
https://github.com/apache/hive/pull/6041/files#diff-187118dfcb8b70a73f5682a2011de5c5cf4d3457789cc44b6d7216c188da5b97L332-R362


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to