Hello, In HIVE-21189 [1], the default value for hive.merge.nway.joins is set to false. There is no record of why it was set to false, and I would like to understand the background for the decision. Specifically I wonder if the following situation is relevant to the decision.
Example) MapJoinOp_1 joins: table G, table A, table B, table C MapJoinOp_2 joins: table G, table A, table B , table D Here, table G is a big table to be read via shuffling. MayJoinOp_1 needs table C, while MapJoinOp_2 needs table D. SharedWorkOptimizer assigns the same cache key to MapJoinOp_1 and MapJoinOp_2 (because of table G and table A), so that both operators can share in-memory tables. Assume that MapJoinOp_1 is executed first and fills the cache first. Then, MapJoinOp_2 does not load the cache which is already filled. As a result, it ends up with something like NullPointerException. After setting hive.merge.nway.joins to true, I encountered a problem (which is not easy to reproduce), and I wonder if the above scenario is feasible in the current implementation. Many thanks, --- Sungwoo [1] https://issues.apache.org/jira/browse/HIVE-21189