Ricardo Pinto created SPARK-27866:
-------------------------------------
Summary: Cannot connect to hive metastore
Key: SPARK-27866
URL: https://issues.apache.org/jira/browse/SPARK-27866
Project: Spark
Issue Type: Bug
Components: Kubernetes
Affects Versions: 2.4.3
Environment: Spark 2.4.3
Kubernetes on EKS (Amazon)
Reporter: Ricardo Pinto
I'm running Spark on Kubernetes and I've compiled spark with:
{code:java}
mvn clean install -Phadoop-3.2 -Phadoop-cloud -Pkubernetes -DskipTests{code}
Then I've built the docker image with:
{code:java}
./bin/docker-image-tool.sh -p
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
build
{code}
I've added hive-site.xml to the classpath: /opt/spark/jars, contents:
{code:java}
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet
type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>metastore.thrift.uris</name>
<value>thrift://hive-metastore-database.presto-staging.svc.cluster.local:9083</value>
</property>
<property>
<name>metastore.task.threads.always</name>
<value>org.apache.hadoop.hive.metastore.events.EventCleanerTask</value>
</property>
<property>
<name>metastore.expression.proxy</name>
<value>org.apache.hadoop.hive.metastore.DefaultPartitionExpressionProxy</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://hive-metastore-database.presto-staging.svc.cluster.local/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>postgres</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>ToBeDefinedByHashiCorpVault</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hive-metastore.presto-staging.svc.cluster.local:9083</value>
<description>IP address (or fully-qualified domain name) and port of the
metastore host</description>
</property>
</configuration>{code}
However, spark doesn't connect to the remote hive metastore, I execute the
following code and get only the default database:
{code:java}
../bin/pyspark
import pyspark
spark_session =
pyspark.sql.SparkSession.builder.enableHiveSupport().getOrCreate()
sql_context = pyspark.sql.SQLContext(spark_session.sparkContext, spark_session)
sql_context.sql("show databases").show(){code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]