Peng Cheng created SPARK-38009:
----------------------------------

             Summary: In start-thriftserver.sh arguments, "--hiveconf xxx" 
should have higher precedence over "--conf spark.hadoop.xxx", or any other 
hadoop configurations
                 Key: SPARK-38009
                 URL: https://issues.apache.org/jira/browse/SPARK-38009
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.2.0, 2.4.8
         Environment: The above experiment is conducted on Apache Spark 2.4.7 & 
3.2.0 respectively.

 

OS: Ubuntu 20.04

Java: OpenJDK1.8.0

 
            Reporter: Peng Cheng


By convention, An Apache Hive server will read configuration options from 
different sources with different precedence, and the precedence of "–hiveconf" 
options in command line options should only be lower than those set by using 
the {*}set command (see 
[https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration] 
for detail){*}. It should be higher than hadoop configuration, or any of the 
configuration files on the server (including, but not limited to hive-site.xml 
and core-site.xml)

This convention is clearly not maintained very well by Apache Spark thrift 
server. As demonstrated in the following example: If I start this server with 
diverging option values on "hive.server2.thrift.port":

 

```
./sbin/start-thriftserver.sh \
--conf spark.hadoop.hive.server2.thrift.port=10001 \
--hiveconf hive.server2.thrift.port=10002
```

 

"–conf"/port 10001 will be preferred over "–hiveconf"/port 10002:

 

```

Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp 
/home/xxx/spark-2.4.7-bin-hadoop2.7-scala2.12/conf/:/home/xxx/spark-2.4.7-bin-hadoop2.7-scala2.12/jars/*
 -Xmx1g org.apache.spark.deploy.SparkSubmit --conf 
spark.hadoop.hive.server2.thrift.port=10001 --class 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift 
JDBC/ODBC Server spark-internal --hiveconf hive.server2.thrift.port=10002
========================================
...
22/01/24 17:32:18 INFO ThriftCLIService: Starting ThriftBinaryCLIService on 
port 10001 with 5...500 worker threads

```

 

replacing "--conf" line with an entry in core-site.xml makes no difference.

I doubt if this divergence from conventional hive server behaviour is 
deliberate. Thus I'm calling the precedence of hive configuration options to be 
set to be on par or maximally similar to that of an Apache Hive server of the 
same version. To my knowledge, it should be:

 

SET command > --hiveconf > hive-site.xml > hive-default.xml > --conf > 
core-site.xml >. core-default.xml



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to