[ 
https://issues.apache.org/jira/browse/SPARK-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-4624.
------------------------------
    Resolution: Cannot Reproduce

This sounds like an S3 issue, but reopen if you can still reproduce and have 
more specific info about how to do it, and what exactly happens.

> Errors when reading/writtign to S3 large object files
> -----------------------------------------------------
>
>                 Key: SPARK-4624
>                 URL: https://issues.apache.org/jira/browse/SPARK-4624
>             Project: Spark
>          Issue Type: Bug
>          Components: EC2, Input/Output, Mesos
>    Affects Versions: 1.1.0
>         Environment: manually setup Mesos cluster in EC2 made of 30 
> c3.4xLArge Nodes
>            Reporter: Kriton Tsintaris
>            Priority: Critical
>
> My cluster is not configured to use hdfs. Instead the local disk of each node 
> is used.
> I've got a number of huge RDD object files (each made of ~600 part files each 
> of ~60 GB). They are updated extremely rarely.
> An example of the model of the data stored in these RDDs is the following: 
> (Long, Array[Long]). 
> When I load them to my cluster, using val page_users = 
> sc.objectFile[(Long,Array[Long])]("s3n://mybucket/path/myrdd.obj.rdd") or 
> equivelant, sometimes data is missing (as if 1 or 2 of the part files was not 
> sucesfuly loaded).
> What is more frustrating is that I get no errors that this has happened! 
> Sometimes reading s3 timeouts or gets some errors but eventually auto-retries 
> do succeed.
> Furthermore If I attempt to write an RDD back into S3, using 
> myrdd.saveAsObjectFile("s3n://..."), the operation will again terminate 
> before it was completed without any warning or indication of error.
> More specifically what will happen is that the object files parts will be 
> left under a _temporary folder and only a few of them will have been moved in 
> the correct "path" in s3. This only happens when I am writing huge object 
> files. If my object file is just a few GB everything will be fine. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to