[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050888#comment-15050888
 ] 

Nemon Lou commented on HIVE-12616:
----------------------------------

[~xuefuz], thanks for review .
It is not surprising that you doubt about the "spark.master" setting in 
HiveConf .
I owe one explanation for the issue described here .

For short , "spark.master" is set for HiveConf during the creation of 
HiveSparkClient.
Snippet of HiveSparkClientFactory#initiateSparkConf :
{code}
    String sparkMaster = hiveConf.get("spark.master");
    if (sparkMaster == null) {
      sparkMaster = sparkConf.get("spark.master");
      hiveConf.set("spark.master", sparkMaster);
    }
{code}
The creation of HiveSparkClient only happens once due to reuse (known as 
SparkSession).
However ,this HiveConf is operation level instead of session level (due to 
asynchronous query).
So ,only the first operation's JobConf has "spark.master" with it .


Now I have two choices :
1, Setting "spark.master" at session level during HiveSparkClient creation .
2, Setting "spark.master" for each operation when not set before ,but using 
sparkConf instead of hiveConf from RemoteHiveSparkClient.(SparkConf in 
RemoteHiveSparkClient already set "spark.master" in an explicit way .) 
Which one do you prefer ?

Adding a test case for this issue seems difficult (yarn-cluster mode,multiple 
operation in one session ),would you provide some guidance ? 
Thanks.


> NullPointerException when spark session is reused to run a mapjoin
> ------------------------------------------------------------------
>
>                 Key: HIVE-12616
>                 URL: https://issues.apache.org/jira/browse/HIVE-12616
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.3.0
>            Reporter: Nemon Lou
>            Assignee: Nemon Lou
>         Attachments: HIVE-12616.patch
>
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to