-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28500/
-----------------------------------------------------------

Review request for hive, Chao Sun, Suhas Satish, and Xuefu Zhang.


Bugs: HIVE-8943
    https://issues.apache.org/jira/browse/HIVE-8943


Repository: hive-git


Description
-------

SparkMapJoinOptimizer by default combines nested mapjoins into one work due to 
removal of RS for big-table. So we need to enhance the mapjoin check to 
calculate if all the MapJoins in that work (spark-stage) will fit into the 
memory, otherwise it might overwhelm memory for that particular spark executor.


Diffs
-----

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 819eef1 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java 
0c339a5 
  ql/src/test/queries/clientpositive/auto_join_stats.q PRE-CREATION 
  ql/src/test/queries/clientpositive/auto_join_stats2.q PRE-CREATION 
  ql/src/test/results/clientpositive/auto_join_stats.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/auto_join_stats2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/auto_join_stats.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/auto_join_stats2.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/28500/diff/


Testing
-------

Added two unit tests:

1.  auto_join_stats, which sets a memory limit and checks that algorithm does 
not put more than 1 mapjoin in one BaseWork
2.  auto_join_stats2, which is the same query without memory limit, and check 
that algorithm puts all mapjoin in one BaseWork because it can.


Thanks,

Szehon Ho

Reply via email to