[
https://issues.apache.org/jira/browse/HDFS-16550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640224#comment-17640224
]
ASF GitHub Bot commented on HDFS-16550:
---------------------------------------
xkrogen commented on code in PR #4209:
URL: https://github.com/apache/hadoop/pull/4209#discussion_r1033958544
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournaledEditsCache.java:
##########
@@ -277,11 +279,10 @@ void storeEdits(byte[] inputData, long newStartTxn, long
newEndTxn,
initialize(INVALID_TXN_ID);
Journal.LOG.warn(String.format("A single batch of edits was too " +
"large to fit into the cache: startTxn = %d, endTxn = %d, " +
- "input length = %d. The capacity of the cache (%s) must be " +
+ "input length = %d. The capacity of the cache must be " +
Review Comment:
Can we keep the key here in the error message, but print both? Like `(%s or
%s)` where one is `DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY` and one is
`DFS_JOURNALNODE_EDIT_CACHE_SIZE_FRACTION_KEY`
##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/server/TestJournaledEditsCache.java:
##########
@@ -221,6 +224,24 @@ public void testCacheMalformedInput() throws Exception {
cache.retrieveEdits(-1, 10, new ArrayList<>());
}
+ @Test
+ public void testCacheFraction() {
+ // Set dfs.journalnode.edit-cache-size.bytes.
+ Configuration config = new Configuration();
+ config.setInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY, 1);
+
config.setFloat(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_FRACTION_KEY,
0.1f);
+ cache = new JournaledEditsCache(config);
+ assertEquals(1, cache.getCapacity(), 0.0);
Review Comment:
why do we do this as a floating-point comparison? can't we just do
`assertEquals(1, cache.getCapacity())`
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournaledEditsCache.java:
##########
@@ -121,12 +121,14 @@ class JournaledEditsCache {
// ** End lock-protected fields **
JournaledEditsCache(Configuration conf) {
+ float fraction =
conf.getFloat(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_FRACTION_KEY,
+ DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_FRACTION_DEFAULT);
Review Comment:
maybe enforce a check here to guarantee that `fraction < 1.0` ? to fail-fast
in case of misconfigurations
##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/server/TestJournaledEditsCache.java:
##########
@@ -221,6 +224,24 @@ public void testCacheMalformedInput() throws Exception {
cache.retrieveEdits(-1, 10, new ArrayList<>());
}
+ @Test
+ public void testCacheFraction() {
Review Comment:
how about `testCacheSizeConfigs` ? since we are testing both of them and how
they interact
##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/server/TestJournaledEditsCache.java:
##########
@@ -221,6 +224,24 @@ public void testCacheMalformedInput() throws Exception {
cache.retrieveEdits(-1, 10, new ArrayList<>());
}
+ @Test
+ public void testCacheFraction() {
+ // Set dfs.journalnode.edit-cache-size.bytes.
+ Configuration config = new Configuration();
+ config.setInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY, 1);
+
config.setFloat(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_FRACTION_KEY,
0.1f);
+ cache = new JournaledEditsCache(config);
+ assertEquals(1, cache.getCapacity(), 0.0);
+
+ // Don't set dfs.journalnode.edit-cache-size.bytes.
+ Configuration config1 = new Configuration();
+
config1.setFloat(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_FRACTION_KEY,
0.1f);
+ cache = new JournaledEditsCache(config1);
+ assertEquals(
+ memory *
config1.getFloat(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_FRACTION_KEY,
0.0f),
+ cache.getCapacity(), 0.0);
Review Comment:
how about just this?
```suggestion
assertEquals(memory / 10, cache.getCapacity());
```
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml:
##########
@@ -4955,6 +4955,19 @@
</description>
</property>
+<property>
+ <name>dfs.journalnode.edit-cache-size.fraction</name>
+ <value>0.5f</value>
+ <description>
+ This ratio refers to the proportion of the maximum memory of the JVM.
+ Used to calculate the size of the edits cache that is kept in the
JournalNode's memory.
Review Comment:
we should explicitly mention that this is an alternative to the `.bytes`
config and either copy some of the guidance there (about when it will be
enabled, txn size, etc.) or reference it like "see the documentation for
`dfs.journalnode.edit-cache-size.bytes` to learn more"
##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/server/TestJournaledEditsCache.java:
##########
@@ -221,6 +224,24 @@ public void testCacheMalformedInput() throws Exception {
cache.retrieveEdits(-1, 10, new ArrayList<>());
}
+ @Test
+ public void testCacheFraction() {
+ // Set dfs.journalnode.edit-cache-size.bytes.
+ Configuration config = new Configuration();
+ config.setInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY, 1);
+
config.setFloat(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_FRACTION_KEY,
0.1f);
+ cache = new JournaledEditsCache(config);
+ assertEquals(1, cache.getCapacity(), 0.0);
+
+ // Don't set dfs.journalnode.edit-cache-size.bytes.
+ Configuration config1 = new Configuration();
+
config1.setFloat(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_FRACTION_KEY,
0.1f);
+ cache = new JournaledEditsCache(config1);
+ assertEquals(
+ memory *
config1.getFloat(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_FRACTION_KEY,
0.0f),
+ cache.getCapacity(), 0.0);
+ }
Review Comment:
can we also confirm the default behavior (neither config set) ?
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml:
##########
@@ -4955,6 +4955,19 @@
</description>
</property>
+<property>
+ <name>dfs.journalnode.edit-cache-size.fraction</name>
+ <value>0.5f</value>
+ <description>
+ This ratio refers to the proportion of the maximum memory of the JVM.
+ Used to calculate the size of the edits cache that is kept in the
JournalNode's memory.
+ The recommended value is less than 0.9. We recommend using
+ dfs.journalnode.edit-cache-size.fraction instead of
+ dfs.journalnode.edit-cache-size.bytes. If we set
+ dfs.journalnode.edit-cache-size.bytes, this parameter will not take effect.
Review Comment:
I don't think we specifically recommend one or the other? They are just
different ways to configure it.
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml:
##########
@@ -4955,6 +4955,19 @@
</description>
Review Comment:
for `dfs.journalnode.edit-cache-size.bytes`, we also need to update the
documentation:
1. remove the default value, since there is none. mention that the size
defaults to the value of the `.fraction` config
2. mention that it takes precedence over the `.fraction` config, so if both
are set, then `.bytes` is used
##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/server/TestJournaledEditsCache.java:
##########
@@ -56,8 +56,11 @@ public class TestJournaledEditsCache {
PathUtils.getTestDir(TestJournaledEditsCache.class, false);
private JournaledEditsCache cache;
+ private long memory;
+
Review Comment:
this field seems kind of unnecessary
> [SBN read] Improper cache-size for journal node may cause cluster crash
> -----------------------------------------------------------------------
>
> Key: HDFS-16550
> URL: https://issues.apache.org/jira/browse/HDFS-16550
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Tao Li
> Assignee: Tao Li
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2022-04-21-09-54-29-751.png,
> image-2022-04-21-09-54-57-111.png, image-2022-04-21-12-32-56-170.png
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> When we introduced {*}SBN Read{*}, we encountered a situation during upgrade
> the JournalNodes.
> Cluster Info:
> *Active: nn0*
> *Standby: nn1*
> 1. Rolling restart journal node. {color:#ff0000}(related config:
> fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G){color}
> 2. The cluster runs for a while, edits cache usage is increasing and memory
> is used up.
> 3. {color:#ff0000}Active namenode(nn0){color} shutdown because of “{_}Timed
> out waiting 120000ms for a quorum of nodes to respond”{_}.
> 4. Transfer nn1 to Active state.
> 5. {color:#ff0000}New Active namenode(nn1){color} also shutdown because of
> “{_}Timed out waiting 120000ms for a quorum of nodes to respond” too{_}.
> 6. {color:#ff0000}The cluster crashed{color}.
>
> Related code:
> {code:java}
> JournaledEditsCache(Configuration conf) {
> capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY,
> DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT);
> if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) {
> Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " +
> "maximum JVM memory is only %d bytes. It is recommended that you " +
> "decrease the cache size or increase the heap size.",
> capacity, Runtime.getRuntime().maxMemory()));
> }
> Journal.LOG.info("Enabling the journaled edits cache with a capacity " +
> "of bytes: " + capacity);
> ReadWriteLock lock = new ReentrantReadWriteLock(true);
> readLock = new AutoCloseableLock(lock.readLock());
> writeLock = new AutoCloseableLock(lock.writeLock());
> initialize(INVALID_TXN_ID);
> } {code}
> Currently, *fs.journalNode.edit-cache-size-bytes* can be set to a larger size
> than the memory requested by the process. If
> {*}fs.journalNode.edit-cache-sie.bytes > 0.9 *
> Runtime.getruntime().maxMemory(){*}, only warn logs are printed during
> journalnode startup. This can easily be overlooked by users. However, as the
> cluster runs to a certain period of time, it is likely to cause the cluster
> to crash.
>
> NN log:
> !image-2022-04-21-09-54-57-111.png|width=1012,height=47!
> !image-2022-04-21-12-32-56-170.png|width=809,height=218!
> IMO, we should not set the {{cache size}} to a fixed value, but to the ratio
> of maximum memory, which is 0.2 by default.
> This avoids the problem of too large cache size. In addition, users can
> actively adjust the heap size when they need to increase the cache size.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]