-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28889/
-----------------------------------------------------------

(Updated Dec. 11, 2014, 10:36 p.m.)


Review request for hive, Szehon Ho and Xuefu Zhang.


Changes
-------

Updated the patch by making a SparkMapJoinProcessor, which overwrites some of 
the functionalities in MapJoinProcessor.

It's a shame that we couldn't override convertMapJoin, so 
SparkMapJoinProcessor#generateMapJoinOperator ends up being a duplicate of 
MapJoinProcessor#generateMapJoinOperator


Bugs: HIVE-8911
    https://issues.apache.org/jira/browse/HIVE-8911


Repository: hive-git


Description
-------

Basically the idea is to reuse as much code as possible, from MR.

The issue is that, in MR's MapJoinProcessor, after join op is converted to 
mapjoin op, all the parents ReduceSinkOperators are removed. However, for our 
Spark branch, we need to preserve those, because they serve as boundaries 
between BaseWorks, and SparkReduceSinkMapJoinProc triggers upon them.

Initially I tried to move this part of logic to SparkMapJoinOptimizer, which 
happens at a later stage. However, although this works, I'm worried it may have 
too much affect on the smb join w/ hint, because we then have to move that part 
of logic to SparkMapJoinOptimizer too. In general, I want to minimize the 
affect on code path.

This patch make changes on MapJoinProcessor. I created a separate method 
convertMapJoinForSpark, which doesn't remove the 
ReduceSinkOperators, for small tables. Then, in the transform method it decides 
which method to call based on the execution engine.

I also have to disable several tests related to smb join w/ hints. They can be 
activated once HIVE-8640 is resolved.


Diffs (updated)
-----

  data/conf/spark/hive-site.xml 44eac86 
  itests/src/test/resources/testconfiguration.properties 2348e06 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 773c827 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java a8a3d86 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkMapJoinProcessor.java 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/bucket_map_join_1.q.out f24ae73 
  ql/src/test/results/clientpositive/spark/bucket_map_join_2.q.out 33e9e8b 
  ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out aaa0151 
  ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out 9954b77 
  ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out ad8f0a5 
  ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out aa3e2b6 
  ql/src/test/results/clientpositive/spark/bucketmapjoin13.q.out 44233f6 
  ql/src/test/results/clientpositive/spark/bucketmapjoin2.q.out c4702ef 
  ql/src/test/results/clientpositive/spark/bucketmapjoin3.q.out 7c31e05 
  ql/src/test/results/clientpositive/spark/bucketmapjoin4.q.out a8e892e 
  ql/src/test/results/clientpositive/spark/bucketmapjoin5.q.out 041ba12 
  ql/src/test/results/clientpositive/spark/bucketmapjoin7.q.out 54c4be3 
  ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out da9fe1c 
  ql/src/test/results/clientpositive/spark/bucketmapjoin9.q.out 5a5e3f6 
  ql/src/test/results/clientpositive/spark/bucketmapjoin_negative.q.out 5ac3f4c 
  ql/src/test/results/clientpositive/spark/bucketmapjoin_negative2.q.out 
e4ff965 
  ql/src/test/results/clientpositive/spark/bucketmapjoin_negative3.q.out 
fce5566 
  ql/src/test/results/clientpositive/spark/join25.q.out 284c97d 
  ql/src/test/results/clientpositive/spark/join26.q.out e271184 
  ql/src/test/results/clientpositive/spark/join27.q.out d31f29e 
  ql/src/test/results/clientpositive/spark/join30.q.out 7fbbcfa 
  ql/src/test/results/clientpositive/spark/join36.q.out f1317ea 
  ql/src/test/results/clientpositive/spark/join37.q.out 448e983 
  ql/src/test/results/clientpositive/spark/join38.q.out 735d7ea 
  ql/src/test/results/clientpositive/spark/join39.q.out 0734d4b 
  ql/src/test/results/clientpositive/spark/join40.q.out 60ef13d 
  ql/src/test/results/clientpositive/spark/join_map_ppr.q.out 59fdb99 
  ql/src/test/results/clientpositive/spark/mapjoin1.q.out 80e38b9 
  ql/src/test/results/clientpositive/spark/mapjoin_distinct.q.out dc7241c 
  ql/src/test/results/clientpositive/spark/mapjoin_filter_on_outerjoin.q.out 
3b80437 
  ql/src/test/results/clientpositive/spark/mapjoin_test_outer.q.out fdf8f24 
  ql/src/test/results/clientpositive/spark/semijoin.q.out 2b8e04b 
  ql/src/test/results/clientpositive/spark/skewjoin.q.out 56b78be 

Diff: https://reviews.apache.org/r/28889/diff/


Testing
-------

bucket_map_join_1.q
bucket_map_join_2.q
bucketmapjoin1.q
bucketmapjoin10.q
bucketmapjoin11.q
bucketmapjoin12.q
bucketmapjoin13.q
bucketmapjoin2.q
bucketmapjoin3.q
bucketmapjoin4.q
bucketmapjoin5.q
bucketmapjoin7.q
bucketmapjoin8.q
bucketmapjoin9.q
bucketmapjoin_negative.q
bucketmapjoin_negative2.q
column_access_stats.q
join25.q
join26.q
join27.q
join30.q
join36.q
join37.q
join38.q
join39.q
join40.q
join_empty.q
join_filters_overlap.q
join_map_ppr.q
mapjoin1.q
mapjoin_distinct.q
mapjoin_filter_onerjoin.q
mapjoin_hook.q
mapjoin_tester.q
semijoin.q
skewjoin.q
table_access_keys_stats.q


Thanks,

Chao Sun

Reply via email to