[ 
https://issues.apache.org/jira/browse/SPARK-49910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ángel Álvarez Pascua updated SPARK-49910:
-----------------------------------------
    Attachment: HiveMetaStoreClient.java

> spark TLS connection (+ kerberos) to hive metastore
> ---------------------------------------------------
>
>                 Key: SPARK-49910
>                 URL: https://issues.apache.org/jira/browse/SPARK-49910
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.5.2
>         Environment: spark: 3.5.2_scala2.12
> hadoop: 3.3.6
> iceberg: 1.6.0
> hive: 4.0.0
>  
> spark and HMS java version:
>  
> openjdk version "11.0.24" 2024-07-16
> OpenJDK Runtime Environment Temurin-11.0.24+8 (build 11.0.24+8)
> OpenJDK 64-Bit Server VM Temurin-11.0.24+8 (build 11.0.24+8, mixed mode, 
> sharing)
>            Reporter: Stefano Bovina
>            Priority: Major
>         Attachments: HiveMetaStoreClient.java
>
>
> Hi,
> we are trying to configure an integration between trino, spark and hive 
> metastore (HMS) in a secure way.
>  
> Hive metastore has already been configured in order to use kerberos and TLS.
> Trino has already been configured in order to connect to HMS using TLS and 
> kerberos.
>  
> Trying to do the same for spark (connect it to HMS using TLS and kerberos) we 
> faced a problem with TLS connection: if we configure spark using kerberos and 
> plain connection to HMS (reconfiguring HMS too) it works, but if we enable 
> TLS on both, spark is not able to connect.
>  
> The error on HMS is the following: "Caused by: javax.net.ssl.SSLException: 
> Unsupported or unrecognized SSL message" and indeed connections initiated by 
> spark are alway plain.
>  
> The test matrix is the following:
>  # hive (kerberos + ssl) + spark (kerberos + ssl) --> not working
>  # hive (kerberos + plain) --> spark (kerberos + plain) --> works
>  # hive (ssl) ---> spark (ssl) --> works
>  
> While doing "test 1", I also used tcpdump to figure out if spark was trying 
> to start an ssl or a plain connection to hive, and for what I'm seeing spark 
> is completely ignoring the following parameters and keep trying to open a 
> plain connection:
> {code:java}
> spark.hive.metastore.use.SSL true
> spark.hive.metastore.truststore.path /opt/spark/ssl/cert.jks
> spark.hive.metastore.truststore.password mypassword{code}
> If I enable both kerberos and ssl (test 1), it seems like those hive ssl 
> related configurations on spark-defaults are being ignored and spark always 
> tries to open a plain connection; for example, If I set 
> "spark.hive.metastore.truststore.password" to "wrongpassword" the error 
> "Password verification failed" should be raised, but nothing
>  
> spark conf: [https://gist.github.com/bovy89/83cbe3b9cd7a318fa9fd35355d5801fc]
> pyspark logs: 
> [https://gist.github.com/bovy89/a06f0aa4a54f454fea9e0d6ff148cfc5#file-pyspark-log]
> pyspark debug logs: 
> [https://gist.github.com/bovy89/e6a9eeca389f05ff7bea78f807ce5714]
> hive metastore logs: 
> [https://gist.github.com/bovy89/a06f0aa4a54f454fea9e0d6ff148cfc5#file-hive-metastore-log]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to