[ 
https://issues.apache.org/jira/browse/SPARK-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214320#comment-14214320
 ] 

Vijay commented on SPARK-4402:
------------------------------

Yes, output path is being validated in PairRDDFunctions.saveAsHadoopDataset. 
Please find the below exception details.
So, the output path is validated only during the execution  
saveAsHadoopDataset. After completing all the preceding statements. 

My query is that is it possible to make this validation in the first place when 
the program executon starts.

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: 
Output directory file:/home/HadoopUser/eclipse-scala/test/output1 already exists
        at 
org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
        at 
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:968)
        at 
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:878)
        at 
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:792)
        at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1159)
        at test.OutputTest$.main(OutputTest.scala:19)
        at test.OutputTest.main(OutputTest.scala)

> Output path validation of an action statement resulting in runtime exception
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-4402
>                 URL: https://issues.apache.org/jira/browse/SPARK-4402
>             Project: Spark
>          Issue Type: Wish
>            Reporter: Vijay
>            Priority: Minor
>
> Output path validation is happening at the time of statement execution as a 
> part of lazyevolution of action statement. But if the path already exists 
> then it throws a runtime exception. Hence all the processing completed till 
> that point is lost which results in resource wastage (processing time and CPU 
> usage).
> If this I/O related validation is done before the RDD action operations then 
> this runtime exception can be avoided.
> I believe similar validation/ feature is implemented in hadoop also.
> Example:
> SchemaRDD.saveAsTextFile() evaluated the path during runtime 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to