[ 
https://issues.apache.org/jira/browse/SPARK-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Shreedharan resolved SPARK-5545.
-------------------------------------
    Resolution: Duplicate

> [STREAMING] DStream#saveAs**Files can fail after app restarts
> -------------------------------------------------------------
>
>                 Key: SPARK-5545
>                 URL: https://issues.apache.org/jira/browse/SPARK-5545
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>            Priority: Critical
>
> After an app restarts, sometimes the saveAs**Files can fail. This happens if 
> the driver dies while the RDD was being written to HDFS. At this point the 
> rdd-<timestamp> directory has already been created but we have not marked it 
> as completely processed. This causes the RDD to get written after we restart, 
> into the same directory. This can cause the underlying MR api to throw an 
> exception that looks like this:
> {code}
> 15/02/02 13:16:41 INFO yarn.ApplicationMaster: Final app status: FAILED, 
> exitCode: 15, (reason: User class threw exception: Output directory 
> hdfs://wypoon-cdhx-1.ent.cloudera.com:8020/user/systest/flumetest/rdd-1422911774000
>  already exists)
> Exception in thread "Driver" 
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
> hdfs://wypoon-cdhx-1.ent.cloudera.com:8020/user/systest/flumetest/rdd-1422911774000
>  already exists
>       at 
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
>       at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1041)
>       at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:940)
>       at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:849)
>       at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1164)
> ...
> {code}
> Thanks to [~wypoon] for finding this issue!
> I have a PR coming up for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to