[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257247#comment-16257247
 ] 

mohamed imran commented on SPARK-22526:
---------------------------------------

[~srowen] Yes I agree that it is an issue with the http api which spark is 
using to read files from S3. I don't see any other issues apart from the 
httpclient. But as a whole its an issue related to Spark framework which uses 
this http client. You need to look at the http client successful open and close 
connection while using S3 API.

By default Spark 2.2.0 uses httpclient-4.5.2.jar  httpcore-4.4.4.jar.

I suspect this http jars has some issues while connecting to S3 . Please look 
at it.

> Spark hangs while reading binary files from S3
> ----------------------------------------------
>
>                 Key: SPARK-22526
>                 URL: https://issues.apache.org/jira/browse/SPARK-22526
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!      



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to