Ke Han created HBASE-28159: ------------------------------ Summary: Unable to get table state error when table is being initialized Key: HBASE-28159 URL: https://issues.apache.org/jira/browse/HBASE-28159 Project: HBase Issue Type: Bug Components: master Affects Versions: 2.4.17 Reporter: Ke Han Attachments: hbase--master-37bbb9b6f05a.log, persistent.tar.gz
When executing commands to create a table, I noticed the following ERROR in HMaster {code:java} 2023-10-17 06:41:47,118 ERROR [master/hmaster:16000.Chore.1] master.TableStateManager: Unable to get table uuidf68fb89ec7f4435597d69fb7b099d8e7 state org.apache.hadoop.hbase.TableNotFoundException: No state found for uuidf68fb89ec7f4435597d69fb7b099d8e7 at org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:155) at org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:92) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:419) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.getRegionStatesCount(AssignmentManager.java:2341) at org.apache.hadoop.hbase.master.HMaster.getClusterMetricsWithoutCoprocessor(HMaster.java:2616) at org.apache.hadoop.hbase.master.HMaster.getClusterMetricsWithoutCoprocessor(HMaster.java:2537) at org.apache.hadoop.hbase.master.balancer.ClusterStatusChore.chore(ClusterStatusChore.java:47) at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750){code} h1. Reproduce Due to the thread interleaving, it might need to run the following command sequence multiple times to reproduce 1 HM, 2 RS, HDFS-2.10.2 {code:java} create 'uuid49bb410e0a0c40ffb070d17787b4cad7', {NAME => 'uuid66e57e5195e04956a78f789b2a25ec01', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 'uuid119181eed72a43ccb66fabe37f84d2c0', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 'uuidc2d4931eaf4c429db0e55514fb12e767', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuidc9802bbfbe434411ae68bb8388d499b6', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuidc85e117d0ca144719fc53d30b189a343', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'} create 'uuid094dd5bf47eb47d69148b63e73ce0e7c', {NAME => 'uuid76ccbd96fbdc418b95ed9971ff423b2d', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 'uuid36835d3faff04838bd02d6226557d7c8', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuid37752598d1bb405eb39a3e17c04d7e60', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'} create 'uuidf68fb89ec7f4435597d69fb7b099d8e7', {NAME => 'uuidb235288b1d304fe1a62adb63968d9eee', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 'uuidf348f8849e724b3fa231fc2bb459be2d', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 'uuid81341a87083e49d7a0d8aff7b1ccf16a', VERSIONS => 3, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 'uuid24db0d3c67c347d3a4c18af90facec2d', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 'uuid7ecf10315f444cfd9c5698695f9054d9', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'} enable 'uuid094dd5bf47eb47d69148b63e73ce0e7c' create_namespace 'uuidc1066f82d7834f698d335dd04fa7ad3e' alter 'uuid094dd5bf47eb47d69148b63e73ce0e7c', {NAME => 'enaJvIGYBk', BLOOMFILTER => 'ROWCOL', IN_MEMORY => false} disable 'uuidf68fb89ec7f4435597d69fb7b099d8e7' {code} I have attached the full logs. h1. Root Cause The ERROR message is thrown because of the thread interleaving between (1) T1: creating the table and (2) T2: Chore thread calculating TABLE_TO_REGIONS_COUNT. Here's how it happens in detail # User issues a create table request, it puts the table name into tableDescriptors. # Chore thread is trying to calculate TABLE_TO_REGIONS_COUNT by iterating all tables from {*}getTableDescriptors().getAll(){*}. This also includes the table which is being created but the table state is not created yet. # It tries to fetch the table state and throws an ERROR. IMO, this is a normal and correct process which shouldn't incur ERROR level message. It could be avoided by properly handling the thread interleaving between table updates and chore threads. I am trying to fix it. Any help would be appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010)