[ 
https://issues.apache.org/jira/browse/SPARK-49910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17908771#comment-17908771
 ] 

Ángel Álvarez Pascua edited comment on SPARK-49910 at 12/30/24 5:23 AM:
------------------------------------------------------------------------

Spark 3.5.2 (including its latest release, 3.5.4) uses the 
{{HiveMetaStoreClient}} class from the dependency {{hive-metastore-2.3.9.jar}} 
(released June 10, 2021). This version does not appear to support enabling both 
SASL (Kerberos) and SSL mechanisms simultaneously. When SASL is enabled, the 
Hive Metastore seems to create always a non-SSL socket. As a result, after 
Kerberos validation is completed in the Metastore, the "Unsupported or 
unrecognized SSL message" error occurs. This happens because the message is 
sent in plain text without SSL encryption.

 

I reproduced the same issue using the {{HiveMetaStoreClient}} from the 
{{hive-metastore-2.3.9}} library in a standalone test without using Spark. 
However, the connection worked successfully after updating the dependency to 
version {{{}4.0.0{}}}.

 

Although this issue is not directly Spark-related, it's strongly suggested for 
Spark to update its dependencies to more recent versions to avoid compatibility 
issues like this one.

 

*Note:* Another workaround for this issue, without updating the version of the 
Hive MetaStore dependency in Spark, is to overwrite the client with a modified 
version of [^HiveMetaStoreClient.java]


was (Author: JIRAUSER306614):
Spark 3.5.2 (including its latest release, 3.5.4) uses the 
{{HiveMetaStoreClient}} class from the dependency {{hive-metastore-2.3.9.jar}} 
(released June 10, 2021). This version does not appear to support enabling both 
SASL (Kerberos) and SSL mechanisms simultaneously. When SASL is enabled, the 
Hive Metastore seems to create always a non-SSL socket. As a result, after 
Kerberos validation is completed in the Metastore, the "Unsupported or 
unrecognized SSL message" error occurs. This happens because the message is 
sent in plain text without SSL encryption.

 

I reproduced the same issue using the {{HiveMetaStoreClient}} from the 
{{hive-metastore-2.3.9}} library in a standalone test without using Spark. 
However, the connection worked successfully after updating the dependency to 
version {{{}4.0.0{}}}.

 

Although this issue is not directly Spark-related, it's strongly suggested for 
Spark to update its dependencies to more recent versions to avoid compatibility 
issues like this one.

 

*Note:* Another workaround for this issue, without updating the version of the 
Hive MetaStore client, is to overwrite the client with a modified version of 
[^HiveMetaStoreClient.java]

> spark TLS connection (+ kerberos) to hive metastore
> ---------------------------------------------------
>
>                 Key: SPARK-49910
>                 URL: https://issues.apache.org/jira/browse/SPARK-49910
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.5.2
>         Environment: spark: 3.5.2_scala2.12
> hadoop: 3.3.6
> iceberg: 1.6.0
> hive: 4.0.0
>  
> spark and HMS java version:
>  
> openjdk version "11.0.24" 2024-07-16
> OpenJDK Runtime Environment Temurin-11.0.24+8 (build 11.0.24+8)
> OpenJDK 64-Bit Server VM Temurin-11.0.24+8 (build 11.0.24+8, mixed mode, 
> sharing)
>            Reporter: Stefano Bovina
>            Priority: Major
>         Attachments: HiveMetaStoreClient.java
>
>
> Hi,
> we are trying to configure an integration between trino, spark and hive 
> metastore (HMS) in a secure way.
>  
> Hive metastore has already been configured in order to use kerberos and TLS.
> Trino has already been configured in order to connect to HMS using TLS and 
> kerberos.
>  
> Trying to do the same for spark (connect it to HMS using TLS and kerberos) we 
> faced a problem with TLS connection: if we configure spark using kerberos and 
> plain connection to HMS (reconfiguring HMS too) it works, but if we enable 
> TLS on both, spark is not able to connect.
>  
> The error on HMS is the following: "Caused by: javax.net.ssl.SSLException: 
> Unsupported or unrecognized SSL message" and indeed connections initiated by 
> spark are alway plain.
>  
> The test matrix is the following:
>  # hive (kerberos + ssl) + spark (kerberos + ssl) --> not working
>  # hive (kerberos + plain) --> spark (kerberos + plain) --> works
>  # hive (ssl) ---> spark (ssl) --> works
>  
> While doing "test 1", I also used tcpdump to figure out if spark was trying 
> to start an ssl or a plain connection to hive, and for what I'm seeing spark 
> is completely ignoring the following parameters and keep trying to open a 
> plain connection:
> {code:java}
> spark.hive.metastore.use.SSL true
> spark.hive.metastore.truststore.path /opt/spark/ssl/cert.jks
> spark.hive.metastore.truststore.password mypassword{code}
> If I enable both kerberos and ssl (test 1), it seems like those hive ssl 
> related configurations on spark-defaults are being ignored and spark always 
> tries to open a plain connection; for example, If I set 
> "spark.hive.metastore.truststore.password" to "wrongpassword" the error 
> "Password verification failed" should be raised, but nothing
>  
> spark conf: [https://gist.github.com/bovy89/83cbe3b9cd7a318fa9fd35355d5801fc]
> pyspark logs: 
> [https://gist.github.com/bovy89/a06f0aa4a54f454fea9e0d6ff148cfc5#file-pyspark-log]
> pyspark debug logs: 
> [https://gist.github.com/bovy89/e6a9eeca389f05ff7bea78f807ce5714]
> hive metastore logs: 
> [https://gist.github.com/bovy89/a06f0aa4a54f454fea9e0d6ff148cfc5#file-hive-metastore-log]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to