[
https://issues.apache.org/jira/browse/SPARK-27866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849627#comment-16849627
]
Ricardo Pinto commented on SPARK-27866:
---------------------------------------
It's the same. Is there a way to test the hive metastore connection directly
from spark? I specify the metastore address but where can I see spark
connecting to it?
> Cannot connect to hive metastore
> --------------------------------
>
> Key: SPARK-27866
> URL: https://issues.apache.org/jira/browse/SPARK-27866
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes
> Affects Versions: 2.4.3
> Environment: Spark 2.4.3
> Kubernetes on EKS (Amazon)
> Reporter: Ricardo Pinto
> Priority: Major
> Labels: bug
>
> I'm running Spark on Kubernetes and I've compiled spark with:
> {code:java}
> mvn clean install -Phadoop-3.2 -Phadoop-cloud -Pkubernetes -DskipTests{code}
> Then I've built the docker image with:
>
> {code:java}
> ./bin/docker-image-tool.sh -p
> resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
> build
> {code}
>
> I've added hive-site.xml to the classpath: /opt/spark/jars, contents:
>
> {code:java}
> <?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet
> type="text/xsl" href="configuration.xsl"?>
> <configuration>
> <property>
> <name>metastore.thrift.uris</name>
>
> <value>thrift://hive-metastore-database.presto-staging.svc.cluster.local:9083</value>
> </property>
> <property>
> <name>metastore.task.threads.always</name>
> <value>org.apache.hadoop.hive.metastore.events.EventCleanerTask</value>
>
> </property>
> <property>
> <name>metastore.expression.proxy</name>
> <value>org.apache.hadoop.hive.metastore.DefaultPartitionExpressionProxy</value>
> </property>
> <property>
> <name>javax.jdo.option.ConnectionURL</name>
>
> <value>jdbc:postgresql://hive-metastore-database.presto-staging.svc.cluster.local/metastore</value>
> </property>
> <property>
> <name>javax.jdo.option.ConnectionDriverName</name>
> <value>org.postgresql.Driver</value>
> </property>
> <property>
> <name>javax.jdo.option.ConnectionUserName</name>
> <value>postgres</value>
> </property>
> <property>
> <name>javax.jdo.option.ConnectionPassword</name>
> <value>ToBeDefinedByHashiCorpVault</value>
> </property>
> <property>
> <name>hive.metastore.uris</name>
>
> <value>thrift://hive-metastore.presto-staging.svc.cluster.local:9083</value>
> <description>IP address (or fully-qualified domain name) and port of the
> metastore host</description>
> </property>
> </configuration>{code}
>
> However, spark doesn't connect to the remote hive metastore, I execute the
> following code and get only the default database:
>
> {code:java}
> ../bin/pyspark
> import pyspark
> spark_session =
> pyspark.sql.SparkSession.builder.enableHiveSupport().getOrCreate()
> sql_context = pyspark.sql.SQLContext(spark_session.sparkContext,
> spark_session)
> sql_context.sql("show databases").show(){code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]