-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27265/#review58769
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/lib/TypeRule.java
<https://reviews.apache.org/r/27265/#comment99922>

    Is the cost being set to 1 so that it will be lower than other 
RuleRegExp.java based rules and hence get triggered earlier? Since lower the 
cost, better the rule match.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinFactory.java
<https://reviews.apache.org/r/27265/#comment99924>

    Can you change this to SMB Join to avoid confusion?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinFactory.java
<https://reviews.apache.org/r/27265/#comment99925>

    again to avoid confusion, can you please rename the method to 
initSMBJoinPlan



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java
<https://reviews.apache.org/r/27265/#comment99926>

    so when deferSetup boolean flag is true, when does createMapWork get 
deferred to?


- Suhas Satish


On Oct. 28, 2014, 2:20 a.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27265/
> -----------------------------------------------------------
> 
> (Updated Oct. 28, 2014, 2:20 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This change re-uses the SMBJoinOperator for Spark.  Background: the logical 
> layer already converts joins to SMB Joins.  This changes just introduces a 
> class called "SparkSortMergeJoinFactory" on the Spark-compile path which 
> attaches the data structures (like local work, bucket info) to the MapWork 
> for the SMBJoinOperator to consume.  It is largely-based on the MapReduce 
> class "MapJoinFactory".
> 
> However, in spark-path, it is activated only for SMBJoin and not map-joins, 
> as we have another strategy for map-joins.  That is why there's a new 
> optimizer-rule called "TypeRule", so this processor is only run on 
> SMBJoinOperators (which share same name with MapJoinOperators, which is 
> needed for logical-optimizers dealing with hints).
> 
> One major assumption around the whole SMB concept is that both tables have 
> corresponding buckets.  I found during testing of large numbers of buckets 
> (like auto_sortmerge_join_16) that "insert" into a bucketed table wasn't 
> putting the same keys in corresponding buckets.  I activated MR-style shuffle 
> (hash-shuffle instead of total-order shuffle), and that seemed to solve the 
> issue.
> 
> 
> Diffs
> -----
> 
>   itests/src/test/resources/testconfiguration.properties 00c9f4d 
>   ql/src/java/org/apache/hadoop/hive/ql/lib/TypeRule.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ae1d1ab 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinFactory.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
> ed88c60 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> 8e28887 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 4f5feca 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
> 1c663c4 
>   ql/src/test/results/clientpositive/spark/auto_join0.q.out 76ff63d 
>   ql/src/test/results/clientpositive/spark/auto_join10.q.out 05a5912 
>   ql/src/test/results/clientpositive/spark/auto_join11.q.out 998c28b 
>   ql/src/test/results/clientpositive/spark/auto_join12.q.out d2b7993 
>   ql/src/test/results/clientpositive/spark/auto_join13.q.out 78aa01e 
>   ql/src/test/results/clientpositive/spark/auto_join15.q.out 5916070 
>   ql/src/test/results/clientpositive/spark/auto_join16.q.out 0b6807d 
>   ql/src/test/results/clientpositive/spark/auto_join18.q.out 6083b38 
>   ql/src/test/results/clientpositive/spark/auto_join18_multi_distinct.q.out 
> 01c8f0a 
>   ql/src/test/results/clientpositive/spark/auto_join20.q.out a8f2b9a 
>   ql/src/test/results/clientpositive/spark/auto_join21.q.out f9ac35d 
>   ql/src/test/results/clientpositive/spark/auto_join22.q.out 516322c 
>   ql/src/test/results/clientpositive/spark/auto_join23.q.out ce5a670 
>   ql/src/test/results/clientpositive/spark/auto_join24.q.out 15b8888 
>   ql/src/test/results/clientpositive/spark/auto_join27.q.out 67f5739 
>   ql/src/test/results/clientpositive/spark/auto_join28.q.out b979661 
>   ql/src/test/results/clientpositive/spark/auto_join29.q.out 0951b8d 
>   ql/src/test/results/clientpositive/spark/auto_join30.q.out 98b3974 
>   ql/src/test/results/clientpositive/spark/auto_join31.q.out df502c8 
>   ql/src/test/results/clientpositive/spark/auto_join32.q.out 8d83188 
>   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out e64d4fb 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_1.q.out 
> 9158d65 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_10.q.out 
> f608cc5 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_11.q.out 
> 3c26363 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_12.q.out 
> 65e496f 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_13.q.out 
> a5a281b 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_14.q.out 
> 2fc3bb6 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_15.q.out 
> 74cbd7c 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_16.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_2.q.out 
> d1bb7a0 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_3.q.out 
> d57a1d7 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_4.q.out 
> 8244c50 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_5.q.out 
> 2ab1bca 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_6.q.out 
> bc4a163 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_7.q.out 
> 16ef3ae 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_8.q.out 
> 9fd3e5a 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_9.q.out 
> a7f994f 
>   ql/src/test/results/clientpositive/spark/bucket2.q.out b1b2997 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 019c11a 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 2cbab11 
>   ql/src/test/results/clientpositive/spark/bucket_map_join_1.q.out 4ec619e 
>   ql/src/test/results/clientpositive/spark/bucket_map_join_2.q.out 1c288c2 
>   ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out 8be3edd 
>   ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out 9e45843 
>   ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out 0c1ac4b 
>   ql/src/test/results/clientpositive/spark/bucketmapjoin13.q.out dc1b8cf 
>   ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out 6d72fdf 
>   ql/src/test/results/clientpositive/spark/bucketmapjoin9.q.out d80bdcf 
>   ql/src/test/results/clientpositive/spark/count.q.out c527c1d 
>   ql/src/test/results/clientpositive/spark/ctas.q.out 0ded266 
>   ql/src/test/results/clientpositive/spark/disable_merge_for_bucketing.q.out 
> 590b265 
>   ql/src/test/results/clientpositive/spark/escape_clusterby1.q.out 52bdf6a 
>   ql/src/test/results/clientpositive/spark/escape_distributeby1.q.out 736db5e 
>   ql/src/test/results/clientpositive/spark/escape_orderby1.q.out 6e1c0cf 
>   ql/src/test/results/clientpositive/spark/escape_sortby1.q.out 58b663c 
>   ql/src/test/results/clientpositive/spark/groupby1.q.out 847f45c 
>   ql/src/test/results/clientpositive/spark/groupby10.q.out 2095843 
>   ql/src/test/results/clientpositive/spark/groupby11.q.out 70db5a5 
>   ql/src/test/results/clientpositive/spark/groupby2.q.out 86e2f2a 
>   ql/src/test/results/clientpositive/spark/groupby3.q.out 13a5fab 
>   ql/src/test/results/clientpositive/spark/groupby3_map.q.out dac2824 
>   ql/src/test/results/clientpositive/spark/groupby3_map_multi_distinct.q.out 
> d2c054a 
>   ql/src/test/results/clientpositive/spark/groupby3_map_skew.q.out ec6439a 
>   ql/src/test/results/clientpositive/spark/groupby3_noskew.q.out 0c9a7e1 
>   
> ql/src/test/results/clientpositive/spark/groupby3_noskew_multi_distinct.q.out 
> 42fbb8c 
>   ql/src/test/results/clientpositive/spark/groupby4.q.out 318c5a3 
>   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 22a05b5 
>   
> ql/src/test/results/clientpositive/spark/groupby7_map_multi_single_reducer.q.out
>  bc453c6 
>   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 2a07f2a 
>   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 00a0707 
>   
> ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
>  36640ef 
>   ql/src/test/results/clientpositive/spark/groupby8.q.out d8295ce 
>   ql/src/test/results/clientpositive/spark/groupby8_map.q.out b9aa597 
>   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out b9aa597 
>   ql/src/test/results/clientpositive/spark/groupby8_noskew.q.out b9aa597 
>   ql/src/test/results/clientpositive/spark/groupby9.q.out bec2346 
>   ql/src/test/results/clientpositive/spark/groupby_complex_types.q.out 
> 16fadea 
>   
> ql/src/test/results/clientpositive/spark/groupby_complex_types_multi_single_reducer.q.out
>  7470843 
>   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 169c4ac 
>   
> ql/src/test/results/clientpositive/spark/groupby_multi_insert_common_distinct.q.out
>  d3457da 
>   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
> 3abd0e3 
>   
> ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer2.q.out 
> 7f74c62 
>   
> ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer3.q.out 
> c4b7419 
>   ql/src/test/results/clientpositive/spark/groupby_position.q.out 9e58189 
>   ql/src/test/results/clientpositive/spark/groupby_ppr.q.out 860aa58 
>   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 0aeff6b 
>   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 61dd2be 
>   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
> 99da734 
>   ql/src/test/results/clientpositive/spark/having.q.out 5e9f20d 
>   ql/src/test/results/clientpositive/spark/input14.q.out e7d4db6 
>   ql/src/test/results/clientpositive/spark/input17.q.out 0882a29 
>   ql/src/test/results/clientpositive/spark/input18.q.out 802fb0a 
>   ql/src/test/results/clientpositive/spark/input1_limit.q.out 33ecd07 
>   ql/src/test/results/clientpositive/spark/insert_into1.q.out e9be658 
>   ql/src/test/results/clientpositive/spark/insert_into2.q.out 5c8e9c7 
>   ql/src/test/results/clientpositive/spark/insert_into3.q.out 6c0111d 
>   ql/src/test/results/clientpositive/spark/join0.q.out 55b725e 
>   ql/src/test/results/clientpositive/spark/join15.q.out 1651db1 
>   ql/src/test/results/clientpositive/spark/join18.q.out 7b64fb6 
>   ql/src/test/results/clientpositive/spark/join18_multi_distinct.q.out 
> 57c4516 
>   ql/src/test/results/clientpositive/spark/join20.q.out f06ffac 
>   ql/src/test/results/clientpositive/spark/join21.q.out e81ec5a 
>   ql/src/test/results/clientpositive/spark/join23.q.out 3982ea7 
>   ql/src/test/results/clientpositive/spark/join29.q.out d5383d5 
>   ql/src/test/results/clientpositive/spark/join30.q.out 5c16622 
>   ql/src/test/results/clientpositive/spark/join31.q.out 9193df9 
>   ql/src/test/results/clientpositive/spark/join35.q.out 1750aec 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_17.q.out 482268c 
> 
> Diff: https://reviews.apache.org/r/27265/diff/
> 
> 
> Testing
> -------
> 
> Ran existing auto_sortmerge_* tests.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>

Reply via email to