[ 
https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257224#comment-16257224
 ] 

mohamed imran edited comment on SPARK-22526 at 11/17/17 5:04 PM:
-----------------------------------------------------------------

[~srowen] I am processing inside the foreach loop. like this
example code:-
Dataframe.collect.foreach{x=>


filepath = x.getAs("filepath");

ziprdd = sc.binaryfiles(s"$filepath") ;// filename will be test.zip(example)

ziprdd.count;



}

i dont process Avro files. I am processing binary files which is compressed 
normal CSV files from S3.

After some 100th or above 150th read, spark gets hangs while reading from S3.

Hope this info is suffice to clarify the issues. Let me know if you need 
anything else.


was (Author: imranece59):
[~srowen] I am processing inside the foreach loop. like this
example code:-
Dataframe.collect.foreach{x=>


filepath = x.getAs("filepath")

ziprdd = sc.binaryfiles(s"$filepath") // filename will be test.zip(example)

ziprdd.count



}

i dont process Avro files. I am processing binary files which is compressed 
normal CSV files from S3.

After some 100th or above 150th read, spark gets hangs while reading from S3.

Hope this info is suffice to clarify the issues. Let me know if you need 
anything else.

> Spark hangs while reading binary files from S3
> ----------------------------------------------
>
>                 Key: SPARK-22526
>                 URL: https://issues.apache.org/jira/browse/SPARK-22526
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use 
> sc.binaryfiles to read the files.
> It is working fine until some 100 file read but later it get hangs 
> indefinitely from 5 up to 40 mins like Avro file read issue(it was fixed in 
> the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but 
> didn't help.
> And finally i ended up using the spark speculation parameter set which is 
> again didnt help much. 
> One thing Which I observed is that it is not closing the connection after 
> every read of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!      



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to