----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28500/ -----------------------------------------------------------
Review request for hive, Chao Sun, Suhas Satish, and Xuefu Zhang. Bugs: HIVE-8943 https://issues.apache.org/jira/browse/HIVE-8943 Repository: hive-git Description ------- SparkMapJoinOptimizer by default combines nested mapjoins into one work due to removal of RS for big-table. So we need to enhance the mapjoin check to calculate if all the MapJoins in that work (spark-stage) will fit into the memory, otherwise it might overwhelm memory for that particular spark executor. Diffs ----- ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java 819eef1 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java 0c339a5 ql/src/test/queries/clientpositive/auto_join_stats.q PRE-CREATION ql/src/test/queries/clientpositive/auto_join_stats2.q PRE-CREATION ql/src/test/results/clientpositive/auto_join_stats.q.out PRE-CREATION ql/src/test/results/clientpositive/auto_join_stats2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/auto_join_stats.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/auto_join_stats2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/28500/diff/ Testing ------- Added two unit tests: 1. auto_join_stats, which sets a memory limit and checks that algorithm does not put more than 1 mapjoin in one BaseWork 2. auto_join_stats2, which is the same query without memory limit, and check that algorithm puts all mapjoin in one BaseWork because it can. Thanks, Szehon Ho