[
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xuefu Zhang resolved HIVE-9697.
-------------------------------
Resolution: Won't Fix
This should be just a doc "fix", as discussed above.
> Hive on Spark is not as aggressive as MR on map join [Spark Branch]
> -------------------------------------------------------------------
>
> Key: HIVE-9697
> URL: https://issues.apache.org/jira/browse/HIVE-9697
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Xin Hao
> Labels: TODOC-SPARK
>
> We have a finding during running some Big-Bench cases:
> when the same small table size threshold is used, Map Join operator will not
> be generated in Stage Plans for Hive on Spark, while will be generated for
> Hive on MR.
> For example, When we run BigBench Q25, the meta info of one input ORC table
> is as below:
> totalSize=1748955 (about 1.5M)
> rawDataSize=123050375 (about 120M)
> If we use the following parameter settings,
> set hive.auto.convert.join=true;
> set hive.mapjoin.smalltable.filesize=25000000;
> set hive.auto.convert.join.noconditionaltask=true;
> set hive.auto.convert.join.noconditionaltask.size=100000000; (100M)
> Map Join will be enabled for Hive on MR mode, while will not be enabled for
> Hive on Spark.
> We found that for Hive on MR, the HDFS file size for the table
> (ContentSummary.getLength(), should approximate the value of ‘totalSize’)
> will be used to compare with the threshold 100M (smaller than 100M), while
> for Hive on Spark 'rawDataSize' will be used to compare with the threshold
> 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark
> for this case. And as a result Hive on Spark will get much lower performance
> data than Hive on MR for this case.
> When we set hive.auto.convert.join.noconditionaltask.size=150000000; (150M),
> MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have
> similar performance data with Hive on MR by then.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)