[
https://issues.apache.org/jira/browse/HBASE-28866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ariadne updated HBASE-28866:
----------------------------
Summary: Master will fail to start due to a hard-to-diagnose
misconfiguration. (was: Master will be )
> Master will fail to start due to a hard-to-diagnose misconfiguration.
> ---------------------------------------------------------------------
>
> Key: HBASE-28866
> URL: https://issues.apache.org/jira/browse/HBASE-28866
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 2.4.2, 3.0.0-beta-1
> Reporter: Ariadne
> Priority: Critical
> Fix For: 3.0.0-beta-1
>
> Attachments: LogCleaner.patch
>
>
> ============================
> Problem
> -------------------------------------------------
> HBase Master cannot be initialized with the following setting:
> <property>
> <name>hbase.oldwals.cleaner.thread.size</name>
> <value>-1</value>
> <description>Default is 2</description>
> </property>
>
> After running the start-hbase.sh, the Master node could not be started due to
> an exception:
> {code:java}
> ERROR [master/localhost:16000:becomeActiveMaster] master.HMaster: Failed to
> become active master
> java.lang.IllegalArgumentException: Illegal Capacity: -1
> at java.util.ArrayList.<init>(ArrayList.java:157)
> at
> org.apache.hadoop.hbase.master.cleaner.LogCleaner.createOldWalsCleaner(LogCleaner.java:149)
> at
> org.apache.hadoop.hbase.master.cleaner.LogCleaner.<init>(LogCleaner.java:80)
> at
> org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1329)
> at
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:917)
> at
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2081)
> at org.apache.hadoop.hbase.master.HMaster.lambda$0(HMaster.java:505)
> at java.lang.Thread.run(Thread.java:750){code}
> We were really confused and misled by the error log as the 'Illegal Capacity'
> of ArrayList seems like an internal code issue.
>
> After we read the source code, we found that
> "hbase.oldwals.cleaner.thread.size" is parsed and used in
> createOldWalsCleaner() function without checking:
> {code:java}
> int size = conf.getInt(OLD_WALS_CLEANER_THREAD_SIZE,
> DEFAULT_OLD_WALS_CLEANER_THREAD_SIZE); this.oldWALsCleaner =
> createOldWalsCleaner(size); {code}
> The value of "hbase.oldwals.cleaner.thread.size" will be served as the
> initialCapacity of ArrayList. If the configuration value is negative, an
> IllegalArgumentException will be thrown.:
> {code:java}
> private List<Thread> createOldWalsCleaner(int size) {
> ...
> List<Thread> oldWALsCleaner = new ArrayList<>(size);
> ...
> } {code}
> ============================
> Solution (the attached patch)
> -------------------------------------------------
> The basic idea of the attached patch is to add a check and relevant logging
> for this value during the initialization of the {{LogCleaner}} in the
> constructor. This will help users better diagnose the issue. The detailed
> patch is shown below.
> {code:java}
> @@ -78,6 +78,11 @@
> public class LogCleaner extends CleanerChore<BaseLogCleanerDelegate>
> pool, params, null);
> this.pendingDelete = new LinkedBlockingQueue<>();
> int size = conf.getInt(OLD_WALS_CLEANER_THREAD_SIZE,
> DEFAULT_OLD_WALS_CLEANER_THREAD_SIZE);
> + if (size <= 0) {
> + LOG.warn("The size of old WALs cleaner thread is {}, which is invalid,
> "
> + + "the default value will be used.", size);
> + size = DEFAULT_OLD_WALS_CLEANER_THREAD_SIZE;
> + }
> this.oldWALsCleaner = createOldWalsCleaner(size);
> this.cleanerThreadTimeoutMsec =
> conf.getLong(OLD_WALS_CLEANER_THREAD_TIMEOUT_MSEC,
> DEFAULT_OLD_WALS_CLEANER_THREAD_TIMEOUT_MSEC);{code}
> Thanks!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)