[
https://issues.apache.org/jira/browse/SPARK-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-4624.
------------------------------
Resolution: Cannot Reproduce
This sounds like an S3 issue, but reopen if you can still reproduce and have
more specific info about how to do it, and what exactly happens.
> Errors when reading/writtign to S3 large object files
> -----------------------------------------------------
>
> Key: SPARK-4624
> URL: https://issues.apache.org/jira/browse/SPARK-4624
> Project: Spark
> Issue Type: Bug
> Components: EC2, Input/Output, Mesos
> Affects Versions: 1.1.0
> Environment: manually setup Mesos cluster in EC2 made of 30
> c3.4xLArge Nodes
> Reporter: Kriton Tsintaris
> Priority: Critical
>
> My cluster is not configured to use hdfs. Instead the local disk of each node
> is used.
> I've got a number of huge RDD object files (each made of ~600 part files each
> of ~60 GB). They are updated extremely rarely.
> An example of the model of the data stored in these RDDs is the following:
> (Long, Array[Long]).
> When I load them to my cluster, using val page_users =
> sc.objectFile[(Long,Array[Long])]("s3n://mybucket/path/myrdd.obj.rdd") or
> equivelant, sometimes data is missing (as if 1 or 2 of the part files was not
> sucesfuly loaded).
> What is more frustrating is that I get no errors that this has happened!
> Sometimes reading s3 timeouts or gets some errors but eventually auto-retries
> do succeed.
> Furthermore If I attempt to write an RDD back into S3, using
> myrdd.saveAsObjectFile("s3n://..."), the operation will again terminate
> before it was completed without any warning or indication of error.
> More specifically what will happen is that the object files parts will be
> left under a _temporary folder and only a few of them will have been moved in
> the correct "path" in s3. This only happens when I am writing huge object
> files. If my object file is just a few GB everything will be fine.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]