Ricardo Pinto created SPARK-27866:
-------------------------------------

             Summary: Cannot connect to hive metastore
                 Key: SPARK-27866
                 URL: https://issues.apache.org/jira/browse/SPARK-27866
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 2.4.3
         Environment: Spark 2.4.3

Kubernetes on EKS (Amazon)
            Reporter: Ricardo Pinto


I'm running Spark on Kubernetes and I've compiled spark with:
{code:java}
mvn clean install -Phadoop-3.2 -Phadoop-cloud -Pkubernetes -DskipTests{code}
Then I've built the docker image with: 

 
{code:java}
./bin/docker-image-tool.sh -p 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
 build 
{code}
 
 I've added hive-site.xml to the classpath: /opt/spark/jars, contents:
  
{code:java}
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet 
type="text/xsl" href="configuration.xsl"?>
  <configuration>
    <property>
      <name>metastore.thrift.uris</name>
      
<value>thrift://hive-metastore-database.presto-staging.svc.cluster.local:9083</value>
  </property>
  <property>
    <name>metastore.task.threads.always</name>       
    <value>org.apache.hadoop.hive.metastore.events.EventCleanerTask</value>    
  </property>
  <property>
    <name>metastore.expression.proxy</name>   
<value>org.apache.hadoop.hive.metastore.DefaultPartitionExpressionProxy</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    
<value>jdbc:postgresql://hive-metastore-database.presto-staging.svc.cluster.local/metastore</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>     
    <value>org.postgresql.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>postgres</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>   
    <value>ToBeDefinedByHashiCorpVault</value>
  </property>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://hive-metastore.presto-staging.svc.cluster.local:9083</value>
    <description>IP address (or fully-qualified domain name) and port of the 
metastore host</description>
  </property>
</configuration>{code}
 

However, spark doesn't connect to the remote hive metastore, I execute the 
following code and get only the default database:

 
{code:java}
../bin/pyspark
import pyspark
spark_session = 
pyspark.sql.SparkSession.builder.enableHiveSupport().getOrCreate()
sql_context = pyspark.sql.SQLContext(spark_session.sparkContext, spark_session)
sql_context.sql("show databases").show(){code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to