[
https://issues.apache.org/jira/browse/HADOOP-18842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18030210#comment-18030210
]
ASF GitHub Bot commented on HADOOP-18842:
-----------------------------------------
github-actions[bot] commented on PR #5931:
URL: https://github.com/apache/hadoop/pull/5931#issuecomment-3408713878
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> Support Overwrite Directory On Commit For S3A Committers
> --------------------------------------------------------
>
> Key: HADOOP-18842
> URL: https://issues.apache.org/jira/browse/HADOOP-18842
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.0
> Reporter: Syed Shameerur Rahman
> Assignee: Syed Shameerur Rahman
> Priority: Major
> Labels: pull-request-available
>
> The goal is to add a new kind of commit mechanism in which the destination
> directory is cleared off before committing the file.
> *Use Case*
> In case of dynamicPartition insert overwrite queries, The destination
> directory which needs to be overwritten are not known before the execution
> and hence it becomes a challenge to clear off the destination directory.
>
> One approach to handle this is, The underlying engines/client will clear off
> all the destination directories before calling the commitJob operation but
> the issue with this approach is that, In case of failures while committing
> the files, We might end up with the whole of previous data being deleted
> making the recovery process difficult or time consuming.
>
> *Solution*
> Based on mode of commit operation either *INSERT* or *OVERWRITE* , During
> commitJob operations, The committer will map each destination directory with
> the commits which needs to be added in the directory and if the mode is
> *OVERWRITE* , The committer will delete the directory recursively and then
> commit each of the files in the directory. So in case of failures (worst
> case) The number of destination directory which will be deleted will be equal
> to the number of threads if we do it in multi-threaded way as compared to the
> whole data if it was done in the engine side.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]