[
https://issues.apache.org/jira/browse/SPARK-43585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
roland updated SPARK-43585:
---------------------------
Description:
I created a Spark Connect shell in a pod using the following yaml.
{code:java}
apiVersion: v1
kind: Service
metadata:
name: spark-connect-svc
namespace: MY_NAMESPACE
spec:
clusterIP: None
selector:
app: spark-connect-pod
podType: spark-connect-driver
apiVersion: v1
kind: Pod
metadata:
name: spark-connect-pod
namespace: realtime-streaming
labels:
app: spark-connect-pod
podType: spark-connect-driver
spec:
restartPolicy: Never
containers:
- command:
- sh
- -c
- /opt/spark/sbin/start-connect-server.sh --master
k8s://https://MY_API_SERVER:443 --packages
org.apache.spark:spark-connect_2.12:3.4.0 --conf
spark.kubernetes.executor.limit.cores=1.0 --conf
spark.kubernetes.executor.request.cores=1.0 --conf spark.executor.cores=1
--conf spark.executor.memory=6G --conf
spark.kubernetes.container.image=MY_ECR_REPO/spark:3.4-prd --conf
spark.kubernetes.executor.podNamePrefix=spark-connect --num-executors=10 --conf
spark.kubernetes.driver.pod.name=spark-connect-pod --conf
spark.kubernetes.namespace=MY_NAMESPACE && tail -100f
/opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-spark-connect-pod.out
image: MY_ECR_REPO/spark-py:3.4-prd
name: spark-connect-pod
{code}
The Spark Connect server was successfully launched and I can connect to it
using pyspark.
But when I want to add a Hive metastore config , it won't work.
{code:java}
>>> spark =
>>> SparkSession.builder.remote("sc://spark-connect-svc").config("spark.hive.metastore.uris",
>>> "thrift://hive-metastore:9083").getOrCreate()
>>> spark.sql("show databases").show()
+---------+
|namespace|
+---------+
| default|
+---------+{code}
was:
I created a Spark Connect shell in a pod using the following yaml.
{code:java}
apiVersion: v1
kind: Service
metadata:
name: spark-connect-svc
namespace: MY_NAMESPACE
spec:
clusterIP: None
selector:
app: spark-connect-pod
podType: spark-connect-driver
apiVersion: v1
kind: Pod
metadata:
name: spark-connect-pod
namespace: realtime-streaming
labels:
app: spark-connect-pod
podType: spark-connect-driver
spec:
restartPolicy: Never
containers:
- command:
- sh
- -c
- /opt/spark/sbin/start-connect-server.sh --master
k8s://https://MY_API_SERVER:443 --packages
org.apache.spark:spark-connect_2.12:3.4.0 --conf
spark.kubernetes.executor.limit.cores=1.0 --conf
spark.kubernetes.executor.request.cores=1.0 --conf spark.executor.cores=1
--conf spark.executor.memory=6G --conf
spark.kubernetes.container.image=MY_ECR_REPO/spark:3.4-prd --conf
spark.kubernetes.executor.podNamePrefix=spark-connect --num-executors=10 --conf
spark.kubernetes.driver.pod.name=spark-connect-pod --conf
spark.kubernetes.namespace=MY_NAMESPACE && tail -100f
/opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-spark-connect-pod.out
image: MY_ECR_REPO/spark-py:3.4-prd
name: spark-connect-pod
{code}
The Spark Connect server was successfully launched and I can connect to it
using pyspark.
But when I want to add a Hive metastore config , it won't work.
{code:java}
>>> spark =
>>> SparkSession.builder.remote("sc://spark-connect-svc").config("spark.hive.metastore.uris",
>>> "thrift://hive-metastore:9083").getOrCreate()
>>> spark.sql("show databases").show()
+---------+
|namespace|
+---------+
| default|
+---------+{code}
> Spark Connect client cannot read from Hive metastore
> ----------------------------------------------------
>
> Key: SPARK-43585
> URL: https://issues.apache.org/jira/browse/SPARK-43585
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.4.0
> Reporter: roland
> Priority: Major
>
> I created a Spark Connect shell in a pod using the following yaml.
> {code:java}
> apiVersion: v1
> kind: Service
> metadata:
> name: spark-connect-svc
> namespace: MY_NAMESPACE
> spec:
> clusterIP: None
> selector:
> app: spark-connect-pod
> podType: spark-connect-driver
> apiVersion: v1
> kind: Pod
> metadata:
> name: spark-connect-pod
> namespace: realtime-streaming
> labels:
> app: spark-connect-pod
> podType: spark-connect-driver
> spec:
> restartPolicy: Never
> containers:
> - command:
> - sh
> - -c
> - /opt/spark/sbin/start-connect-server.sh --master
> k8s://https://MY_API_SERVER:443 --packages
> org.apache.spark:spark-connect_2.12:3.4.0 --conf
> spark.kubernetes.executor.limit.cores=1.0 --conf
> spark.kubernetes.executor.request.cores=1.0 --conf spark.executor.cores=1
> --conf spark.executor.memory=6G --conf
> spark.kubernetes.container.image=MY_ECR_REPO/spark:3.4-prd --conf
> spark.kubernetes.executor.podNamePrefix=spark-connect --num-executors=10
> --conf spark.kubernetes.driver.pod.name=spark-connect-pod --conf
> spark.kubernetes.namespace=MY_NAMESPACE && tail -100f
> /opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-spark-connect-pod.out
> image: MY_ECR_REPO/spark-py:3.4-prd
> name: spark-connect-pod
> {code}
> The Spark Connect server was successfully launched and I can connect to it
> using pyspark.
>
> But when I want to add a Hive metastore config , it won't work.
>
> {code:java}
> >>> spark =
> >>> SparkSession.builder.remote("sc://spark-connect-svc").config("spark.hive.metastore.uris",
> >>> "thrift://hive-metastore:9083").getOrCreate()
> >>> spark.sql("show databases").show()
> +---------+
> |namespace|
> +---------+
> | default|
> +---------+{code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]