----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/64688/ -----------------------------------------------------------
(Updated Feb. 10, 2018, 12:05 a.m.) Review request for hive, Ashutosh Chauhan and Jason Dere. Changes ------- Fixed following issues, - Handled the case when small table has more buckets than big table by taking mod of obtained bucket id - Handled the fallback case for old logic when bigt table has more buckets than smaller table(s) - Updated auto_sortmerge_join_16. This test would fail with SMB by default due to missing buckets but now gives correct results. - Reverted all the updated tests which originally tested small tables with more buckets. Repository: hive-git Description ------- Bucket based Join : Handle buckets with no splits. The current logic in CustomPartitionVertex assumes that there is a split for each bucket whereas in Tez, we can have no splits for empty buckets. Also falls back to reduceside join if small table has more buckets than big table. Disallow loading files in bucketed tables if the file name format is not like 000000_0, 000001_0_copy_1 etc. Diffs (updated) ----- ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java 26afe90faa ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomVertexConfiguration.java ef5e7edcd6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 9885038588 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java dc698c8de8 ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 54f5bab6de ql/src/test/queries/clientpositive/auto_sortmerge_join_16.q 8216b538c2 ql/src/test/results/clientpositive/llap/auto_sortmerge_join_16.q.out 91408df129 ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out_spark 91408df129 Diff: https://reviews.apache.org/r/64688/diff/3/ Changes: https://reviews.apache.org/r/64688/diff/2-3/ Testing ------- Thanks, Deepak Jaiswal