Ke Han created HBASE-28105:
------------------------------
Summary: NPE is thrown in QuotaCache.java when running HBase-2.4.17
Key: HBASE-28105
URL: https://issues.apache.org/jira/browse/HBASE-28105
Project: HBase
Issue Type: Bug
Components: Quotas
Affects Versions: 2.5.5, 2.4.17
Reporter: Ke Han
Attachments: 0001-avoid-NPE.patch
When running HBase-2.4.17, I met a NPE in regionserver log.
h1. Reproduce
Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.
Execute the following commands in the HMaster node using hbase shell,
{code:java}
create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME =>
'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE',
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME =>
'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ',
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME =>
'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ',
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME =>
'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ',
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME =>
'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ',
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
create 'uuiddeb610fded9744889840ecd03dd18739', {NAME =>
'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ',
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9',
'uuid46ddc3d3557e413e915e2393ae72c082',
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9',
'uuid449de028da6b4d35be0f187ebec6c3be'
drop 'uuiddeb610fded9744889840ecd03dd18739'
put 'uuidd9efa97f93a442b686adae6d9f7bb2e9',
'uuidf4704cae4d1e4661bd7664d26eb6b31b',
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp',
'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME =>
'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE',
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME =>
'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE',
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
put 'uuid9d05a5cb34e64910ac90675186e7d0d4',
'uuid552e42ade4c14099a1d8643bea1616d4',
'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
Then the exception will be thrown in either RS1 or RS2
{code:java}
2023-09-19 20:29:28,268 INFO [RS_OPEN_REGION-regionserver/hregion2:16020-2]
handler.AssignRegionHandler: Opened
uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1]
hbase.ScheduledChore: Caught error
java.lang.NullPointerException
at
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
at
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750){code}
h1. Root Cause
The NPE is thrown at
{code:java}
private void updateQuotaFactors() {
// Update machine quota factor
ClusterMetrics clusterMetrics;
try {
clusterMetrics = rsServices.getConnection().getAdmin()
.getClusterMetrics(EnumSet.of(Option.SERVERS_NAME,
Option.TABLE_TO_REGIONS_COUNT));
} catch (IOException e) {
LOG.warn("Failed to get cluster metrics needed for updating quotas", e);
return;
} int rsSize = clusterMetrics.getServersName().size();
if (rsSize != 0) {
// TODO if use rs group, the cluster limit should be shared by the rs group
machineQuotaFactor = 1.0 / rsSize;
} Map<TableName, RegionStatesCount> tableRegionStatesCount =
clusterMetrics.getTableRegionStatesCount(); // Update table machine quota
factors
for (TableName tableName : tableQuotaCache.keySet()) {
double factor = 1;
try {
long regionSize = tableRegionStatesCount.get(tableName).getOpenRegions();
if (regionSize == 0) {
factor = 0;
} else {
int localRegionSize = rsServices.getRegions(tableName).size();
factor = 1.0 * localRegionSize / regionSize;
}
} catch (IOException e) {
LOG.warn("Get table regions failed: {}", tableName, e);
}
tableMachineQuotaFactors.put(tableName, factor);
}
} {code}
This function tries to update the tableQuotaCache. At line 378: the
tableRegionStatesCount.get(tableName) return null and thus it runs into NPE.
The tableName leading to NPE is '{*}uuidd9efa97f93a442b686adae6d9f7bb2e9{*}',
which is disabled and dropped by the user.
{code:java}
long regionSize = tableRegionStatesCount.get(tableName).getOpenRegions(); {code}
The root cause here is that when updating the cache, it might iterate the table
that has been dropped. Maybe we can add a check to make sure the table still
exists in the system.
This bug should also happen in *2.5.5* since the related code in QuotaCache
remains the same (But I only tested 2.4.17).
I attached a simple fix, but not sure whether it also works for other cases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)