[ 
https://issues.apache.org/jira/browse/SPARK-38330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

André F. updated SPARK-38330:
-----------------------------
    Description: 
Trying to run any job after bumping our Spark version from 3.1.2 to 3.2.1, lead 
us to the current exception while reading files on s3:
{code:java}
org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on 
s3a://<bucket>/<path>.parquet: com.amazonaws.SdkClientException: Unable to 
execute HTTP request: Certificate for <bucket> doesn't match any of the subject 
alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]: Unable to execute 
HTTP request: Certificate for <bucket> doesn't match any of the subject 
alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:208) at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170) at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3351) 
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
 at org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277) 
at 
org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54)
 at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
 at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274) at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245) 
at scala.Option.getOrElse(Option.scala:189) at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245) at 
org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:596) {code}
We found similar problems in the following tickets but:
 - https://issues.apache.org/jira/browse/HADOOP-17017 (we don't use `.` in our 
bucket names)
 - [https://github.com/aws/aws-sdk-java-v2/issues/1786] (we tried to override 
it by building Spark with `httpclient:4.5.10` or `httpclient:4.5.8`, with no 
effect. We also made sure we are using the same `httpclient` version on our 
main jar).

  was:
Trying to run any job after bumping our Spark version from 3.1.2 to 3.2.1, lead 
us to the current exception while reading files on s3:

```
org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on 
s3a://<bucket>/<path>.parquet: com.amazonaws.SdkClientException: Unable to 
execute HTTP request: Certificate for <bucket> doesn't match any of the subject 
alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]: Unable to execute 
HTTP request: Certificate for <bucket> doesn't match any of the subject 
alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:208) at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170) at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3351) 
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
 at org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277) 
at 
org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54)
 at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
 at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274) at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245) 
at scala.Option.getOrElse(Option.scala:189) at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245) at 
org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:596)
```

We found similar problems in the following tickets but:
- https://issues.apache.org/jira/browse/HADOOP-17017 (we don't use `.` in our 
bucket names)
- [https://github.com/aws/aws-sdk-java-v2/issues/1786] (we tried to override it 
by building Spark with `httpclient:4.5.10` or `httpclient:4.5.8`, with no 
effect. We also made sure we are using the same `httpclient` version on our 
main jar).


> Certificate doesn't match any of the subject alternative names: 
> [*.s3.amazonaws.com, s3.amazonaws.com]
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-38330
>                 URL: https://issues.apache.org/jira/browse/SPARK-38330
>             Project: Spark
>          Issue Type: Bug
>          Components: EC2
>    Affects Versions: 3.2.1
>         Environment: Spark 3.2.1 built with `hadoop-cloud` flag.
> Direct access to s3 using default file committer.
> JDK8.
>  
>            Reporter: André F.
>            Priority: Major
>
> Trying to run any job after bumping our Spark version from 3.1.2 to 3.2.1, 
> lead us to the current exception while reading files on s3:
> {code:java}
> org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on 
> s3a://<bucket>/<path>.parquet: com.amazonaws.SdkClientException: Unable to 
> execute HTTP request: Certificate for <bucket> doesn't match any of the 
> subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]: Unable to 
> execute HTTP request: Certificate for <bucket> doesn't match any of the 
> subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:208) at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170) at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3351)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277) 
> at 
> org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
>  at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274) 
> at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245)
>  at scala.Option.getOrElse(Option.scala:189) at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245) at 
> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:596) {code}
> We found similar problems in the following tickets but:
>  - https://issues.apache.org/jira/browse/HADOOP-17017 (we don't use `.` in 
> our bucket names)
>  - [https://github.com/aws/aws-sdk-java-v2/issues/1786] (we tried to override 
> it by building Spark with `httpclient:4.5.10` or `httpclient:4.5.8`, with no 
> effect. We also made sure we are using the same `httpclient` version on our 
> main jar).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to