[jira] [Commented] (HIVE-14919) Improve the performance of Hive on Spark 2.0.0

liyunzhang_intel (JIRA) Mon, 13 Mar 2017 20:33:00 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923492#comment-15923492
 ]


liyunzhang_intel commented on HIVE-14919:
-----------------------------------------

[~lirui]: {quote}
One thing I noted is the Xms flag was removed from the executor's options via 
SPARK-12384. We may want to set it the same as Xmx to achieve better 
performance.
{quote}

 not very understand this point, because now spark does not allow to use Xmx to 
specify max heap memory settings and only use 
${{spark.executor.memory}}
org.apache.spark.SparkConf#validateSettings
{code}
 // Validate spark.executor.extraJavaOptions
    getOption(executorOptsKey).foreach { javaOpts =>
      if (javaOpts.contains("-Dspark")) {
        val msg = s"$executorOptsKey is not allowed to set Spark options (was 
'$javaOpts'). " +
          "Set them directly on a SparkConf or in a properties file when using 
./bin/spark-submit."
        throw new Exception(msg)
      }
      if (javaOpts.contains("-Xmx")) {
        val msg = s"$executorOptsKey is not allowed to specify max heap memory 
settings " +
          s"(was '$javaOpts'). Use spark.executor.memory instead."
        throw new Exception(msg)
      }
    }
{code}

> Improve the performance of Hive on Spark 2.0.0
> ----------------------------------------------
>
>                 Key: HIVE-14919
>                 URL: https://issues.apache.org/jira/browse/HIVE-14919
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ferdinand Xu
>            Assignee: Ferdinand Xu
>
> In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
> BigBench[1] to run benchmark with Spark 2.0 over 1 TB data set comparing with 
> Spark 1.6. We can see performance improvments about 5.4% in general and 45% 
> for the best case. However, some queries doesn't have significant performance 
> improvements.  This JIRA is the umbrella ticket addressing those performance 
> issues.
> [1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-14919) Improve the performance of Hive on Spark 2.0.0

Reply via email to