[
https://issues.apache.org/jira/browse/HADOOP-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715867#comment-14715867
]
Andrew Wang commented on HADOOP-12358:
--------------------------------------
bq. If the client OOM because of deleting large directory, make it OOM upon
getContentSummary can actually help avoiding an inconsistent (half completed)
deletion states.
This leads into one of my favorite topics, which is how and why HDFS APIs
differ from POSIX. POSIX gives you unlink and rmdir, so {{rm}} has to crawl the
directory tree, doing {{O(n)}} operations. However, HDFS implements recursive
delete as a single RPC, so 1 operation. This is for performance. We want to
avoid recursing when doing a big delete since RPCs are expensive. Deletes are
also most of the time intentional. So, this patch greatly slows down the common
case, when we already have safety mechanisms like trash and snapshots in place,
and is counter to the intent of the recursive delete RPC.
The other API difference I like is how HDFS combines readdir and stat into
listStatus, again to avoid extra RPCs.
Finally, to tie it back to your comment, right now there is no OOM (or partial
delete) since the client just calls the single RPC and does not need to
enumerate the directory. With this patch, it would. This would be a regression
where a client with a small heap now cannot delete a large directory.
> FSShell should prompt before deleting directories bigger than a configured
> size
> -------------------------------------------------------------------------------
>
> Key: HADOOP-12358
> URL: https://issues.apache.org/jira/browse/HADOOP-12358
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Reporter: Xiaoyu Yao
> Assignee: Xiaoyu Yao
> Attachments: HADOOP-12358.00.patch, HADOOP-12358.01.patch,
> HADOOP-12358.02.patch, HADOOP-12358.03.patch
>
>
> We have seen many cases with customers deleting data inadvertently with
> -skipTrash. The FSShell should prompt user if the size of the data or the
> number of files being deleted is bigger than a threshold even though
> -skipTrash is being used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)