[
https://issues.apache.org/jira/browse/SPARK-34839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047972#comment-18047972
]
zuotingbing commented on SPARK-34839:
-------------------------------------
Is there any work going on this issue?
A lot of people including me have this issue.
> FileNotFoundException on _temporary when multiple app write to same table
> -------------------------------------------------------------------------
>
> Key: SPARK-34839
> URL: https://issues.apache.org/jira/browse/SPARK-34839
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.4.0
> Environment: CDH 6.2.1 Hadoop 3.0.0
> Reporter: Bimalendu Choudhary
> Priority: Major
>
> When multiple Spark applications are writing to the same hive table ( but
> different partitions so not interfering with each other in any way), the
> application finishing first ends up deleting the parent _temporary directory
> which is still being used by other application.
> I think the temporary directory being used by FileOutputCommitter should be
> made configurable to let the caller call with with its own unique value as
> per the requirement and without having to worry about some other application
> deleting it unknowingly. Something like:
> {quote}
> public static final String PENDING_DIR_NAME_DEFAULT = "_temporary";
> public static final String PENDING_DIR_NAME_DEFAULT =
> "mapreduce.fileoutputcommitter.tempdir";
> {quote}
>
> We can not use mapreduce.fileoutputcommitter.algorithm.version = 2 due to its
> issue with data loss https://issues.apache.org/jira/browse/MAPREDUCE-7282.
> There are similar Jira https://issues.apache.org/jira/browse/SPARK-18883,
> whihc was not resolved. This is very generic case of simply one spark
> application messing up working of other spark application working on same
> table and can be avoided by making temp unique or configurable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]