[
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559125#comment-16559125
]
Tao Jie commented on HDFS-13769:
--------------------------------
[~csun], I agree with [~kihwal]. We cannot use this logic in the default
delete operation, since it breaks the existing delete semantics. However we can
use this logic in trash deletion which brings less side effect. Also clear
checkpoint in trash is a typical situation of deleting a large dir, since the
checkpoint dir of trash accumulates deleted files within several hours.
[~jojochuang], Agree! \{{getContentSummary}} is a recursive method and it may
take several seconds if the dir is very large. \{{getContentSummary}} holds the
read-lock in \{{FSNameSystem}} rather than the write-lock. Also we need a way
to know whether a dir is large. If there is a better solution I don't know,
please tell me, and I think it need not to be very accurate.
> Namenode gets stuck when deleting large dir in trash
> ----------------------------------------------------
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.8.2, 3.1.0
> Reporter: Tao Jie
> Assignee: Tao Jie
> Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a
> long time when deleting trash dir with a large mount of data. We found log in
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call.
> We implement a trashPolicy that divide the delete operation into several
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]