[
https://issues.apache.org/jira/browse/HADOOP-18159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600421#comment-17600421
]
Job edited comment on HADOOP-18159 at 9/5/22 1:40 PM:
------------------------------------------------------
+3 [[email protected]] for taking the time to outline. In the end
the following steps were needed to get pyspark 3.3.0 working on an AWS notebook
instance in a custom conda env:
- Install full version of spark 3.3.0 + pyspark 3.3.0
- Allow for dots in bucket: .config("spark.hadoop.fs.s3a.path.style.access",
"true")
- Update to aws-java-sdk-bundle-1.12.262.jar (in sagemaker_pyspark/jars folder)
was (Author: JIRAUSER295133):
+3 [[email protected]] for taking the time to outline. In the end
the following steps were needed to get pyspark 3.3.0 working on an AWS notebook
instance in a custom conda env:
- Install full version of spark 3.3.0 + pyspark 3.3.0
- Allow for dots in bucket: .config("spark.hadoop.fs.s3a.path.style.access",
"true")
- Update to aws-java-sdk-bundle-1.12.262.jar (in sagemaker_pyspark/jars folder)
> Certificate doesn't match any of the subject alternative names:
> [*.s3.amazonaws.com, s3.amazonaws.com]
> ------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-18159
> URL: https://issues.apache.org/jira/browse/HADOOP-18159
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.3.1, 3.3.2, 3.3.3
> Environment: hadoop 3.3.1
> httpclient 4.5.13
> JDK8
> Reporter: André F.
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> h2. If you see this error message when trying to use s3a:// or gs:// URLs,
> look for copies of cos_api-bundle.jar on your classpath and remove them.
> Libraries which include shaded apache httpclient libraries
> (hadoop-client-runtime.jar, aws-java-sdk-bundle.jar,
> gcs-connector-shaded.jar, cos_api-bundle.jar) all load and use the unshaded
> resource mozilla/public-suffix-list.txt. If an out of date version of this is
> found on the classpath first, attempts to negotiate TLS connections may fail
> with the error "Certificate doesn't match any of the subject alternative
> names".
> In a hadoop installation, you can use the findclass tool to track down where
> the public-suffix-list.txt is coming from.
> {code}
> hadoop org.apache.hadoop.util.FindClass locate mozilla/public-suffix-list.txt
> {code}
> So far, the cos_api-bundle-5.6.19.jar appears to be the source of this
> problem.
> ----
> h2. bug report
> Trying to run any job after bumping our Spark version (which is now using
> Hadoop 3.3.1), lead us to the current exception while reading files on s3:
> {code:java}
> org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on
> s3a://<bucket>/<path>.parquet: com.amazonaws.SdkClientException: Unable to
> execute HTTP request: Certificate for <bucket.s3.amazonaws.com> doesn't match
> any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]:
> Unable to execute HTTP request: Certificate for <bucket> doesn't match any of
> the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] at
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:208) at
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170) at
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3351)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277)
> at {code}
>
> {code:java}
> Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for
> <bucket.s3.amazonaws.com> doesn't match any of the subject alternative names:
> [*.s3.amazonaws.com, s3.amazonaws.com]
> at
> com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:507)
> at
> com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:437)
> at
> com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384)
> at
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
> at
> com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
> at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
> at com.amazonaws.http.conn.$Proxy16.connect(Unknown Source)
> at
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
> at
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
> at
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
> at
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
> at
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
> at
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
> at
> com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1333)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
> {code}
> We found similar problems in the following tickets but:
> - https://issues.apache.org/jira/browse/HADOOP-17017 (we don't use `.` in
> our bucket names)
> - [https://github.com/aws/aws-sdk-java-v2/issues/1786] (we tried to override
> it by using `httpclient:4.5.10` or `httpclient:4.5.8`, with no effect).
> We couldn't test it using the native `openssl` configuration due to our
> setup, so we would like to stick with the java ssl implementation, if
> possible.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]