[
https://issues.apache.org/jira/browse/HDFS-16550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624115#comment-17624115
]
ASF GitHub Bot commented on HDFS-16550:
---------------------------------------
tomscut commented on PR #4209:
URL: https://github.com/apache/hadoop/pull/4209#issuecomment-1291347727
> @tomscut Thanks for involving me. In my case, I think this PR is
unnecessary. But we can print some warning logs to prompt the admin if the set
memory is too large, such as more than 90% of the heap size.
>
> But, if anyone thinks this modification is necessary, I will review it
carefully later.
Thanks @ZanderXu for the review. There are already waring logs, but they are
easy to ignore. Because there is no connection between memory and cache size,
it's easy to miss when updating configuration.
> [SBN read] Improper cache-size for journal node may cause cluster crash
> -----------------------------------------------------------------------
>
> Key: HDFS-16550
> URL: https://issues.apache.org/jira/browse/HDFS-16550
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Tao Li
> Assignee: Tao Li
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2022-04-21-09-54-29-751.png,
> image-2022-04-21-09-54-57-111.png, image-2022-04-21-12-32-56-170.png
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> When we introduced {*}SBN Read{*}, we encountered a situation during upgrade
> the JournalNodes.
> Cluster Info:
> *Active: nn0*
> *Standby: nn1*
> 1. Rolling restart journal node. {color:#ff0000}(related config:
> fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G){color}
> 2. The cluster runs for a while, edits cache usage is increasing and memory
> is used up.
> 3. {color:#ff0000}Active namenode(nn0){color} shutdown because of “{_}Timed
> out waiting 120000ms for a quorum of nodes to respond”{_}.
> 4. Transfer nn1 to Active state.
> 5. {color:#ff0000}New Active namenode(nn1){color} also shutdown because of
> “{_}Timed out waiting 120000ms for a quorum of nodes to respond” too{_}.
> 6. {color:#ff0000}The cluster crashed{color}.
>
> Related code:
> {code:java}
> JournaledEditsCache(Configuration conf) {
> capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY,
> DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT);
> if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) {
> Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " +
> "maximum JVM memory is only %d bytes. It is recommended that you " +
> "decrease the cache size or increase the heap size.",
> capacity, Runtime.getRuntime().maxMemory()));
> }
> Journal.LOG.info("Enabling the journaled edits cache with a capacity " +
> "of bytes: " + capacity);
> ReadWriteLock lock = new ReentrantReadWriteLock(true);
> readLock = new AutoCloseableLock(lock.readLock());
> writeLock = new AutoCloseableLock(lock.writeLock());
> initialize(INVALID_TXN_ID);
> } {code}
> Currently, *fs.journalNode.edit-cache-size-bytes* can be set to a larger size
> than the memory requested by the process. If
> {*}fs.journalNode.edit-cache-sie.bytes > 0.9 *
> Runtime.getruntime().maxMemory(){*}, only warn logs are printed during
> journalnode startup. This can easily be overlooked by users. However, as the
> cluster runs to a certain period of time, it is likely to cause the cluster
> to crash.
>
> NN log:
> !image-2022-04-21-09-54-57-111.png|width=1012,height=47!
> !image-2022-04-21-12-32-56-170.png|width=809,height=218!
> IMO, we should not set the {{cache size}} to a fixed value, but to the ratio
> of maximum memory, which is 0.2 by default.
> This avoids the problem of too large cache size. In addition, users can
> actively adjust the heap size when they need to increase the cache size.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]