----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/63427/#review189657 -----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java Line 1417 (original), 1417 (patched) <https://reviews.apache.org/r/63427/#comment266862> Trying to think about how to think about this setting if we're going to use this for tuning. I think a better way of being able to think about this setting is, what kind of selectivity we want from the semijoin reduction before we decide it is worth keeping. For me this setting might be a bit more intuitive (basically float value between 0-1) - for example setting config to 0.5, compared to what you have now, where I think you would set it to 2.0. ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java Lines 1419 (patched) <https://reviews.apache.org/r/63427/#comment266860> Some concerns about long-to-float conversion going on here .. nDVs is cast to float then multiplied to 1.0, and nDVsOfTS is also converted to float during this comparison. This could affect the comparisoin results. Maybe cast nDVsFactored to long when doing the comparision to be safe? ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java Lines 1421 (patched) <https://reviews.apache.org/r/63427/#comment266863> If you are logging here, mention that setShouldRemove is being set. Would be more useful for someone looking at the logs. - Jason Dere On Oct. 30, 2017, 7:09 p.m., Deepak Jaiswal wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/63427/ > ----------------------------------------------------------- > > (Updated Oct. 30, 2017, 7:09 p.m.) > > > Review request for hive, Ashutosh Chauhan and Jason Dere. > > > Repository: hive-git > > > Description > ------- > > Dynamic Semijoin Reduction : markSemiJoinForDPP marks unwanted semijoin > branches > > In method markSemiJoinForDPP (HIVE-17399), the nDVs comparison should not > have equality as there is a chance that the values are same on both sides and > the branch is still marked as good when it shouldn't be. > Add a configurable factor to see how useful this is if nDVs on smaller side > are only slightly less than that on TS side. > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6631a6e45d > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java da30c3b642 > ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q 6cc0a7f7a9 > ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out > 1a1a4d9b2d > > > Diff: https://reviews.apache.org/r/63427/diff/1/ > > > Testing > ------- > > > Thanks, > > Deepak Jaiswal > >