[ 
https://issues.apache.org/jira/browse/HBASE-28105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault reopened HBASE-28105:
---------------------------------------

Re-opening to set fix versions

> NPE in QuotaCache if Table is dropped from cluster
> --------------------------------------------------
>
>                 Key: HBASE-28105
>                 URL: https://issues.apache.org/jira/browse/HBASE-28105
>             Project: HBase
>          Issue Type: Bug
>          Components: Quotas
>    Affects Versions: 2.4.17, 2.5.5
>            Reporter: Ke Han
>            Priority: Major
>         Attachments: 0001-avoid-NPE.patch, 
> hbase--regionserver-a0320910ca45.log
>
>
> When running HBase-2.4.17, I met a NPE in regionserver log.
> h1. Reproduce
> Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.
> Execute the following commands in the HMaster node using hbase shell, 
> {code:java}
> create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 
> 'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
> 'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
> 'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
> 'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
> 'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
> create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
> create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 
> 'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
> incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuid46ddc3d3557e413e915e2393ae72c082', 
> 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
> flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuid449de028da6b4d35be0f187ebec6c3be'
> drop 'uuiddeb610fded9744889840ecd03dd18739'
> put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuidf4704cae4d1e4661bd7664d26eb6b31b', 
> 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 
> 'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
> disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
> drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
> create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 
> 'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
> 'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
> put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 
> 'uuid552e42ade4c14099a1d8643bea1616d4', 
> 'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
> drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
> The exception will be thrown in either RS1 or RS2
> {code:java}
> 2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] 
> handler.AssignRegionHandler: Opened 
> uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
> 2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] 
> hbase.ScheduledChore: Caught error
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
>         at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
>         at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750){code}
> h1. Root Cause
> The NPE is thrown at function: updateQuotaFactors()
> {code:java}
> private void updateQuotaFactors() {
>   // Update machine quota factor
>   ClusterMetrics clusterMetrics;
>   try {
>     clusterMetrics = rsServices.getConnection().getAdmin()
>       .getClusterMetrics(EnumSet.of(Option.SERVERS_NAME, 
> Option.TABLE_TO_REGIONS_COUNT));
>   } catch (IOException e) {
>     LOG.warn("Failed to get cluster metrics needed for updating quotas", e);
>     return;
>   }  int rsSize = clusterMetrics.getServersName().size();
>   if (rsSize != 0) {
>     // TODO if use rs group, the cluster limit should be shared by the rs 
> group
>     machineQuotaFactor = 1.0 / rsSize;
>   }  Map<TableName, RegionStatesCount> tableRegionStatesCount =
>     clusterMetrics.getTableRegionStatesCount();  // Update table machine 
> quota factors
>   for (TableName tableName : tableQuotaCache.keySet()) {
>     double factor = 1;
>     try {
>       // BUGGY LINE
>       long regionSize = 
> tableRegionStatesCount.get(tableName).getOpenRegions();
>       if (regionSize == 0) {
>         factor = 0;
>       } else {
>         int localRegionSize = rsServices.getRegions(tableName).size();
>         factor = 1.0 * localRegionSize / regionSize;
>       }
>     } catch (IOException e) {
>       LOG.warn("Get table regions failed: {}", tableName, e);
>     }
>     tableMachineQuotaFactors.put(tableName, factor);
>   }
> } {code}
> At line 378: the *tableRegionStatesCount.get(tableName)* return null and thus 
> it runs into NPE.
> The tableName leading to NPE is '{*}uuidd9efa97f93a442b686adae6d9f7bb2e9{*}', 
> which is disabled and dropped by the user.
> {code:java}
> long regionSize = tableRegionStatesCount.get(tableName).getOpenRegions(); 
> {code}
> The root cause is that when updating the quota factors, it iterates 
> {*}tableQuotaCache.keySet(){*}, which might contain a table that has been 
> dropped (Outdated cache). 
> h1. Fix
> Maybe we can add a check to make sure the table still exists in the system 
> (Tmp solution). I have attached the full regionserver log and a simple fix.
> This bug should also happen in *2.5.5* since the related code in QuotaCache 
> remains the same (But I only tested 2.4.17).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to