[jira] [Updated] (HDFS-16550) [SBN read] Improper cache-size for journal node may cause cluster crash

tomscut (Jira) Wed, 20 Apr 2022 21:34:08 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


tomscut updated HDFS-16550:
---------------------------
    Description: 
When we introduced {*}SBN Read{*}, we encountered a situation during upgrade 
the JournalNodes.

Cluster Info: 
*Active: nn0*
*Standby: nn1*

1. Rolling restart journal node. {color:#ff0000}(related config: 
fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G){color}

2. The cluster runs for a while.

3. {color:#ff0000}Active namenode(nn0){color} shutdown because of “{_}Timed out 
waiting 120000ms for a quorum of nodes to respond”{_}.

4. Transfer nn1 to Active state.

5. {color:#ff0000}New Active namenode(nn1){color} also shutdown because of 
Timed out waiting 120000ms for a quorum of nodes to respond.

6. {color:#ff0000}The cluster crashed{color}.

 

Related code:
{code:java}
JournaledEditsCache(Configuration conf) {
  capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY,
      DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT);
  if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) {
    Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " +
        "maximum JVM memory is only %d bytes. It is recommended that you " +
        "decrease the cache size or increase the heap size.",
        capacity, Runtime.getRuntime().maxMemory()));
  }
  Journal.LOG.info("Enabling the journaled edits cache with a capacity " +
      "of bytes: " + capacity);
  ReadWriteLock lock = new ReentrantReadWriteLock(true);
  readLock = new AutoCloseableLock(lock.readLock());
  writeLock = new AutoCloseableLock(lock.writeLock());
  initialize(INVALID_TXN_ID);
} {code}
Currently, *fs.journalNode.edit-cache-size-bytes* can be set to a larger size 
than the memory requested by the process. If 
{*}fs.journalNode.edit-cache-sie.bytes > 0.9 * 
Runtime.getruntime().maxMemory(){*}, only warn logs are printed during 
journalnode startup. This can easily be overlooked by users. However, as the 
cluster runs to a certain period of time, it is likely to cause the cluster to 
crash.

 

NN log:

!image-2022-04-21-09-54-57-111.png|width=1012,height=47!

!image-2022-04-21-12-32-56-170.png|width=809,height=218!

 

IMO, when {*}fs.journalNode.edit-cache-size-bytes > threshold * 
Runtime.getruntime ().maxMemory(){*}, we should throw an Exception and 
{color:#ff0000}fast fail{color}. Giving a clear hint for users to update 
related configurations.

  was:
When we introduced {*}SBN Read{*}, we encountered a situation during upgrade 
the JournalNodes.

Cluster Info: 
*Active: nn0*
*Standby: nn1*

1. Rolling restart journal node. {color:#ff0000}(related config: 
fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G){color}

2. The cluster runs for a while.

3. {color:#ff0000}Active namenode(nn0){color} shutdown because of “{_}Timed out 
waiting 120000ms for a quorum of nodes to respond”{_}.

4. Transfer nn1 to Active state.

5. {color:#ff0000}New Active namenode(nn1){color} also shutdown because of 
Timed out waiting 120000ms for a quorum of nodes to respond.

6. {color:#ff0000}The cluster crashed{color}.

 

Related code:
{code:java}
JournaledEditsCache(Configuration conf) {
  capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY,
      DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT);
  if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) {
    Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " +
        "maximum JVM memory is only %d bytes. It is recommended that you " +
        "decrease the cache size or increase the heap size.",
        capacity, Runtime.getRuntime().maxMemory()));
  }
  Journal.LOG.info("Enabling the journaled edits cache with a capacity " +
      "of bytes: " + capacity);
  ReadWriteLock lock = new ReentrantReadWriteLock(true);
  readLock = new AutoCloseableLock(lock.readLock());
  writeLock = new AutoCloseableLock(lock.writeLock());
  initialize(INVALID_TXN_ID);
} {code}
Currently, *fs.journalNode.edit-cache-size-bytes* can be set to a larger size 
than the memory requested by the process. If 
{*}fs.journalNode.edit-cache-sie.bytes > 0.9 * 
Runtime.getruntime().maxMemory(){*}, only warn logs are printed during 
journalnode startup. This can easily be overlooked by users. However, as the 
cluster runs to a certain period of time, it is likely to cause the cluster to 
crash.

!image-2022-04-21-09-54-57-111.png|width=1012,height=47!

!image-2022-04-21-12-32-56-170.png|width=809,height=218!

 

IMO, when {*}fs.journalNode.edit-cache-size-bytes > threshold * 
Runtime.getruntime ().maxMemory(){*}, we should throw an Exception and 
{color:#ff0000}fast fail{color}. Giving a clear hint for users to update 
related configurations.


> [SBN read] Improper cache-size for journal node may cause cluster crash
> -----------------------------------------------------------------------
>
>                 Key: HDFS-16550
>                 URL: https://issues.apache.org/jira/browse/HDFS-16550
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: tomscut
>            Assignee: tomscut
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2022-04-21-09-54-29-751.png, 
> image-2022-04-21-09-54-57-111.png, image-2022-04-21-12-32-56-170.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we introduced {*}SBN Read{*}, we encountered a situation during upgrade 
> the JournalNodes.
> Cluster Info: 
> *Active: nn0*
> *Standby: nn1*
> 1. Rolling restart journal node. {color:#ff0000}(related config: 
> fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G){color}
> 2. The cluster runs for a while.
> 3. {color:#ff0000}Active namenode(nn0){color} shutdown because of “{_}Timed 
> out waiting 120000ms for a quorum of nodes to respond”{_}.
> 4. Transfer nn1 to Active state.
> 5. {color:#ff0000}New Active namenode(nn1){color} also shutdown because of 
> Timed out waiting 120000ms for a quorum of nodes to respond.
> 6. {color:#ff0000}The cluster crashed{color}.
>  
> Related code:
> {code:java}
> JournaledEditsCache(Configuration conf) {
>   capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY,
>       DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT);
>   if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) {
>     Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " +
>         "maximum JVM memory is only %d bytes. It is recommended that you " +
>         "decrease the cache size or increase the heap size.",
>         capacity, Runtime.getRuntime().maxMemory()));
>   }
>   Journal.LOG.info("Enabling the journaled edits cache with a capacity " +
>       "of bytes: " + capacity);
>   ReadWriteLock lock = new ReentrantReadWriteLock(true);
>   readLock = new AutoCloseableLock(lock.readLock());
>   writeLock = new AutoCloseableLock(lock.writeLock());
>   initialize(INVALID_TXN_ID);
> } {code}
> Currently, *fs.journalNode.edit-cache-size-bytes* can be set to a larger size 
> than the memory requested by the process. If 
> {*}fs.journalNode.edit-cache-sie.bytes > 0.9 * 
> Runtime.getruntime().maxMemory(){*}, only warn logs are printed during 
> journalnode startup. This can easily be overlooked by users. However, as the 
> cluster runs to a certain period of time, it is likely to cause the cluster 
> to crash.
>  
> NN log:
> !image-2022-04-21-09-54-57-111.png|width=1012,height=47!
> !image-2022-04-21-12-32-56-170.png|width=809,height=218!
>  
> IMO, when {*}fs.journalNode.edit-cache-size-bytes > threshold * 
> Runtime.getruntime ().maxMemory(){*}, we should throw an Exception and 
> {color:#ff0000}fast fail{color}. Giving a clear hint for users to update 
> related configurations.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-16550) [SBN read] Improper cache-size for journal node may cause cluster crash

Reply via email to