[
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050888#comment-15050888
]
Nemon Lou commented on HIVE-12616:
----------------------------------
[~xuefuz], thanks for review .
It is not surprising that you doubt about the "spark.master" setting in
HiveConf .
I owe one explanation for the issue described here .
For short , "spark.master" is set for HiveConf during the creation of
HiveSparkClient.
Snippet of HiveSparkClientFactory#initiateSparkConf :
{code}
String sparkMaster = hiveConf.get("spark.master");
if (sparkMaster == null) {
sparkMaster = sparkConf.get("spark.master");
hiveConf.set("spark.master", sparkMaster);
}
{code}
The creation of HiveSparkClient only happens once due to reuse (known as
SparkSession).
However ,this HiveConf is operation level instead of session level (due to
asynchronous query).
So ,only the first operation's JobConf has "spark.master" with it .
Now I have two choices :
1, Setting "spark.master" at session level during HiveSparkClient creation .
2, Setting "spark.master" for each operation when not set before ,but using
sparkConf instead of hiveConf from RemoteHiveSparkClient.(SparkConf in
RemoteHiveSparkClient already set "spark.master" in an explicit way .)
Which one do you prefer ?
Adding a test case for this issue seems difficult (yarn-cluster mode,multiple
operation in one session ),would you provide some guidance ?
Thanks.
> NullPointerException when spark session is reused to run a mapjoin
> ------------------------------------------------------------------
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Affects Versions: 1.3.0
> Reporter: Nemon Lou
> Assignee: Nemon Lou
> Attachments: HIVE-12616.patch
>
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)