[ 
https://issues.apache.org/jira/browse/HADOOP-18839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752144#comment-17752144
 ] 

Steve Loughran commented on HADOOP-18839:
-----------------------------------------

bq. The point of this issue is to provide better developer experience without 
overriding default values. Is that possible?
yes, but that is going to need someone to write new code and tests. I am trying 
to also suggest any short term workarounds.

> SSLException is raised after very long timeout
> ----------------------------------------------
>
>                 Key: HADOOP-18839
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18839
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.3.4
>            Reporter: Maxim Martynov
>            Priority: Minor
>         Attachments: host.log, ssl.log
>
>
> I've tried to connect from PySpark to Minio running in docker.
> Installing PySpark and starting Minio:
> {code:bash}
> pip install pyspark==3.4.1
> docker run --rm -d --hostname minio --name minio -p 9000:9000 -p 9001:9001 -e 
> MINIO_ACCESS_KEY=access -e MINIO_SECRET_KEY=Eevoh2wo0ui6ech0wu8oy
> 3feiR3eicha -e MINIO_ROOT_USER=admin -e 
> MINIO_ROOT_PASSWORD=iepaegaigi3ofa9TaephieSo1iecaesh bitnami/minio:latest
> docker exec minio mc mb test-bucket
> {code}
> Then create Spark session:
> {code:python}
> from pyspark.sql import SparkSession
> spark = SparkSession.builder\
>           .config("spark.jars.packages", 
> "org.apache.hadoop:hadoop-aws:3.3.4")\
>           .config("spark.hadoop.fs.s3a.endpoint", "localhost:9000")\
>           .config("spark.hadoop.fs.s3a.access.key", "access")\
>           .config("spark.hadoop.fs.s3a.secret.key", 
> "Eevoh2wo0ui6ech0wu8oy3feiR3eicha")\
>           .config("spark.hadoop.fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider")\
>           .getOrCreate()
> spark.sparkContext.setLogLevel("debug")
> {code}
> And try to access some object in a bucket:
> {code:python}
> import time
> begin = time.perf_counter()
> spark.read.format("csv").load("s3a://test-bucket/fake")
> end = time.perf_counter()
> py4j.protocol.Py4JJavaError: An error occurred while calling o40.load.
> : org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on 
> s3a://test-bucket/fake: com.amazonaws.SdkClientException: Unable to execute 
> HTTP request: Unsupported or unrecognized SSL message: Unable to execute HTTP 
> request: Unsupported or unrecognized SSL message
> ...
> {code}
> [^ssl.log]
> {code:python}
> >>> print((end-begin)/60)
> 14.72387898775002
> {code}
> I was waiting almost *15 minutes* to get the exception from Spark. The reason 
> was I tried to connect to endpoint with 
> {{{}fs.s3a.connection.ssl.enabled=true{}}}, but Minio is configured to listen 
> for HTTP protocol only.
> Is there any way to immediately raise exception if SSL connection cannot be 
> established?
> If I try to pass wrong endpoint, like {{{}localhos:9000{}}}, I'll get 
> exception like this in just 5 seconds:
> {code:java}
> : org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on 
> s3a://test-bucket/fake: com.amazonaws.SdkClientException: Unable to execute 
> HTTP request: test-bucket.localhos: Unable to execute HTTP request: 
> test-bucket.localhos
> ...
> {code}
> [^host.log]
> {code:python}
> >>> print((end-begin)/60)
> 0.09500707178334172
> >>> end-begin
> 5.700424307000503
> {code}
> I know about options like {{fs.s3a.attempts.maximum}} and 
> {{{}fs.s3a.retry.limit{}}}, setting them to 1 will cause raising exception 
> just immediately. But this does not look right.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to