[
https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394599#comment-14394599
]
Kihwal Lee commented on HDFS-7609:
----------------------------------
We have seen a related case. In a relatively small cluster, a user created a
rogue job that caused a lot of transactions on namenode. The edit log was
rolling on its own by the ANN before reaching the regular rolling period. Then
the SBN was losing datanodes because it took incredibly long to replay the
large edit segment. We normally see replay speed of about 30-80k txn/sec (this
is still considerably slower compared to 0.23 or 2.x before introduction of
RetryCache), but in this case it was down to 2k txns/sec, causing the one huge
segment replaying to take several hours.
In this case, the slowdown was because the cache was too small. Since the cache
size is 0.03% of the heap by default, the hash table (GSet) had long chains in
each slot during replaying the edit segment. Increasing the cache size would
have made it better. Since the transaction rate is not always a function of
the size of namespace, the default cache size may not work in many cases.
Also, if the edit rolling period is greater than the cache expiration time
(e.g. 10min), it may make sense to purge the entire cache in more efficient way
before replaying the new segment. We could record the time when finished with a
segment replay and check the elapsed time in the next segment replay.
> startup used too much time to load edits
> ----------------------------------------
>
> Key: HDFS-7609
> URL: https://issues.apache.org/jira/browse/HDFS-7609
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Affects Versions: 2.2.0
> Reporter: Carrey Zhan
> Attachments: HDFS-7609-CreateEditsLogWithRPCIDs.patch,
> recovery_do_not_use_retrycache.patch
>
>
> One day my namenode crashed because of two journal node timed out at the same
> time under very high load, leaving behind about 100 million transactions in
> edits log.(I still have no idea why they were not rolled into fsimage.)
> I tryed to restart namenode, but it showed that almost 20 hours would be
> needed before finish, and it was loading fsedits most of the time. I also
> tryed to restart namenode in recover mode, the loading speed had no different.
> I looked into the stack trace, judged that it is caused by the retry cache.
> So I set dfs.namenode.enable.retrycache to false, the restart process
> finished in half an hour.
> I think the retry cached is useless during startup, at least during recover
> process.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)