[ 
https://issues.apache.org/jira/browse/SPARK-34839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047972#comment-18047972
 ] 

zuotingbing commented on SPARK-34839:
-------------------------------------

Is there any work going on this issue?

A lot of people including me have this issue.

> FileNotFoundException on _temporary when multiple app write to same table
> -------------------------------------------------------------------------
>
>                 Key: SPARK-34839
>                 URL: https://issues.apache.org/jira/browse/SPARK-34839
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>         Environment: CDH 6.2.1 Hadoop 3.0.0 
>            Reporter: Bimalendu Choudhary
>            Priority: Major
>
> When multiple Spark applications are writing to the same hive table ( but 
> different partitions so not interfering with each other in any way), the 
> application finishing first ends up deleting the parent _temporary directory 
> which is still being used by other application.
> I think the temporary directory being used by FileOutputCommitter should be 
> made configurable to let the caller call with with its own unique value as 
> per the requirement and without having to worry about some other application 
> deleting it unknowingly. Something like:
> {quote}
>  public static final String PENDING_DIR_NAME_DEFAULT = "_temporary";
>  public static final String PENDING_DIR_NAME_DEFAULT =
>  "mapreduce.fileoutputcommitter.tempdir";
> {quote}
>  
> We can not use mapreduce.fileoutputcommitter.algorithm.version = 2 due to its 
> issue with data loss https://issues.apache.org/jira/browse/MAPREDUCE-7282. 
> There are similar Jira https://issues.apache.org/jira/browse/SPARK-18883, 
> whihc was not resolved. This is very generic case of simply one spark 
> application messing up working of other spark application working on same 
> table and can be avoided by making temp unique or configurable.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to