[ 
https://issues.apache.org/jira/browse/HADOOP-18159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597771#comment-17597771
 ] 

Job commented on HADOOP-18159:
------------------------------

Came here to say the same as comet. I just spend 3 days trying to fix this but 
to no avail. My setup on an AWS notebook instance:
 * sagemaker-pyspark 1.4.5 (python package)
 * pyspark 3.3.0 (dependency of above)
 * jars
 * !image-2022-08-30-13-22-23-728.png|width=875,height=53!

 

Problem:
 * Upon reading a file from S3 this error is thrown

 

 
{code:java}
22/08/30 11:00:22 WARN FileStreamSink: Assume no metadata directory. Error 
while looking for metadata directory in the path: 
s3a://comp.data.sci.data.tst/some/folder/export_date=20220822.
org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on 
s3a://comp.data.sci.data.tst/some/folder/export_date=20220822: 
com.amazonaws.SdkClientException: Unable to execute HTTP request: Certificate 
for <comp.data.sci.data.tst.s3.amazonaws.com> doesn't match any of the subject 
alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]: Unable to execute 
HTTP request: Certificate for <comp.data.sci.data.tst.s3.amazonaws.com> doesn't 
match any of the subject alternative names: [*.s3.amazonaws.com, 
s3.amazonaws.com]
        at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:208)
        at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3351)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277)
        at 
org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
        at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
        at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
        at scala.Option.getOrElse(Option.scala:189) {code}
{{}}

 

Based on suggested workarounds in the article above I tried 3 things
 * 1. downgrade httpclient to version 4.5.10 → didn’t work

 * 2. tried to set the [https://github.com/aws/aws-sdk-java-v2/issues/1786] 
aws-java-sdk to disable SSL certificate checking → didn’t work

 * 3. try to read from a bucket that doesn’t contain dots (.) → works but I 
don't have the freedom to change it to dashes. 

Would you advice me to wait until the bug is fixed or is there anything else I 
can try?

> Certificate doesn't match any of the subject alternative names: 
> [*.s3.amazonaws.com, s3.amazonaws.com]
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18159
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18159
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.3.1, 3.3.2, 3.3.3
>         Environment: hadoop 3.3.1
> httpclient 4.5.13
> JDK8
>            Reporter: André F.
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h2. If you see this error message when trying to use s3a:// or gs:// URLs, 
> look for copies of cos_api-bundle.jar on your classpath and remove them.
> Libraries which include shaded apache httpclient libraries 
> (hadoop-client-runtime.jar, aws-java-sdk-bundle.jar, 
> gcs-connector-shaded.jar, cos_api-bundle.jar) all load and use the unshaded 
> resource mozilla/public-suffix-list.txt. If an out of date version of this is 
> found on the classpath first, attempts to negotiate TLS connections may fail 
> with the error "Certificate doesn't match any of the subject alternative 
> names". 
> In a hadoop installation, you can use the findclass tool to track down where 
> the public-suffix-list.txt is coming from.
> {code}
> hadoop org.apache.hadoop.util.FindClass locate mozilla/public-suffix-list.txt
> {code}
> So far, the cos_api-bundle-5.6.19.jar appears to be the source of this 
> problem.
> ----
> h2. bug report
> Trying to run any job after bumping our Spark version (which is now using 
> Hadoop 3.3.1), lead us to the current exception while reading files on s3:
> {code:java}
> org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on 
> s3a://<bucket>/<path>.parquet: com.amazonaws.SdkClientException: Unable to 
> execute HTTP request: Certificate for <bucket.s3.amazonaws.com> doesn't match 
> any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]: 
> Unable to execute HTTP request: Certificate for <bucket> doesn't match any of 
> the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:208) at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170) at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3351)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277) 
> at {code}
>  
> {code:java}
> Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for 
> <bucket.s3.amazonaws.com> doesn't match any of the subject alternative names: 
> [*.s3.amazonaws.com, s3.amazonaws.com]
>               at 
> com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:507)
>               at 
> com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:437)
>               at 
> com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384)
>               at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
>               at 
> com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
>               at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
>               at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>               at java.lang.reflect.Method.invoke(Method.java:498)
>               at 
> com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
>               at com.amazonaws.http.conn.$Proxy16.connect(Unknown Source)
>               at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
>               at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
>               at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>               at 
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>               at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>               at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>               at 
> com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
>               at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1333)
>               at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
>   {code}
> We found similar problems in the following tickets but:
>  - https://issues.apache.org/jira/browse/HADOOP-17017 (we don't use `.` in 
> our bucket names)
>  - [https://github.com/aws/aws-sdk-java-v2/issues/1786] (we tried to override 
> it by using `httpclient:4.5.10` or `httpclient:4.5.8`, with no effect).
> We couldn't test it using the native `openssl` configuration due to our 
> setup, so we would like to stick with the java ssl implementation, if 
> possible.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to