[jira] [Updated] (IMPALA-12788) HBaseTable still get loaded even if HBase is down

Quanlong Huang (Jira) Mon, 05 Feb 2024 05:18:05 -0800


     [ 
https://issues.apache.org/jira/browse/IMPALA-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Quanlong Huang updated IMPALA-12788:
------------------------------------
    Description: 
This is identified by an internal S3 build that doesn't launch HBase. There are 
some tests that still run queries on HBase tables, e.g. 
TestDdlStatements::test_alter_set_column_stats. But they don't fail on even if 
the table can't be correctly loaded. Catalogd logs show that the connection 
failure to HBase is ignored:
{noformat}
I0203 14:12:33.687620 20673 TableLoadingMgr.java:71] Loading metadata for 
table: functional_hbase.alltypes
I0203 14:12:33.687674 24282 TableLoader.java:76] Loading metadata for: 
functional_hbase.alltypes (background load)
I0203 14:12:33.687706 20673 TableLoadingMgr.java:73] Remaining items in queue: 
0. Loads in progress: 1
I0203 14:12:33.690941 26564 JniCatalog.java:257] execDdl request: DROP_DATABASE 
test_compute_stats_9c95c5d8 issued by jenkins
I0203 14:12:33.691668 24282 Table.java:218] createEventId_ for table: 
functional_hbase.alltypes set to: -1
......
W0203 14:13:06.941573  1978 ReadOnlyZKClient.java:193] 0x65bc7c50 to 
localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries 
= 30, give up
W0203 14:13:06.947460 24282 ConnectionImplementation.java:641] Retrieve cluster 
id failed
Java exception follows:
java.util.concurrent.ExecutionException: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/hbaseid
        at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at 
org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
        at 
org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
        at 
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
        at 
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
        at 
org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
        at 
org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
        at org.apache.impala.catalog.HBaseTable.load(HBaseTable.java:112)
        at org.apache.impala.catalog.TableLoader.load(TableLoader.java:144)
        at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:245)
        at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:242)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
        at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
        at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
        ... 1 more
I0203 14:13:07.058998 24282 TableLoader.java:175] Loaded metadata for: 
functional_hbase.alltypes (33371ms)
I0203 14:13:07.866829 21368 catalog-server.cc:403] A catalog update with 9 
entries is assembled. Catalog version: 6192 Last sent catalog version: 6181
I0203 14:13:07.870369 21344 catalog-server.cc:816] Collected update: 
1:TABLE:functional_hbase.alltypes, version=6193, original size=3855, compressed 
size=1471
I0203 14:13:07.872047 21344 catalog-server.cc:816] Collected update: 
1:CATALOG_SERVICE_ID, version=6193, original size=60, compressed 
size=58{noformat}
This is problematic since impalad thought the table is correctly loaded and 
will try to load it again when applying the catalog update, which could block 
the statestore subscriber thread for a long time, causing other DDL queries to 
be blocked as well since they can't acquire the catalog update lock.

We've seen TestAsyncLoadData.test_async_load timeout on S3 (IMPALA-11285) and 
this is the cause.

Here are logs showing impalad is blocked in applying catalog update of the 
HBase table:
{noformat}
I0203 14:13:09.359010  3636 Frontend.java:1917] 
db4f57572baab787:ebdb853600000000] Analyzing query: load data inpath 
'/test-warehouse/test_load_staging_beeswax_True'           into table 
test_async_load_898a2f19.test_load_nopart_beeswax_True db: functional
...
I0203 14:13:42.188225  4881 ClientCnxn.java:1246] Socket error occurred: 
localhost/0:0:0:0:0:0:0:1:2181: Connection refused
W0203 14:13:42.288529  4880 ReadOnlyZKClient.java:189] 0x43325be0 to 
localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries 
= 29
I0203 14:13:43.288617  4881 ClientCnxn.java:1111] Opening socket connection to 
server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL 
(unknown error)
I0203 14:13:43.288892  4881 ClientCnxn.java:1246] Socket error occurred: 
localhost/127.0.0.1:2181: Connection refused
W0203 14:13:43.389173  4880 ReadOnlyZKClient.java:189] 0x43325be0 to 
localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries 
= 30
I0203 14:13:44.389231  4881 ClientCnxn.java:1111] Opening socket connection to 
server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL 
(unknown error)
I0203 14:13:44.389554  4881 ClientCnxn.java:1246] Socket error occurred: 
localhost/127.0.0.1:2181: Connection refused
W0203 14:13:44.489856  4880 ReadOnlyZKClient.java:193] 0x43325be0 to 
localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries 
= 30, give up
W0203 14:13:44.500921 22023 ConnectionImplementation.java:641] Retrieve cluster 
id failed
Java exception follows:
java.util.concurrent.ExecutionException: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/hbaseid
        at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at 
org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
        at 
org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
        at 
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
        at 
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
        at 
org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
        at 
org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
        at 
org.apache.impala.catalog.HBaseTable.loadFromThrift(HBaseTable.java:139)
        at org.apache.impala.catalog.Table.fromThrift(Table.java:538)
        at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:474)
        at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:329)
        at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:258)
        at 
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
        at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:513)
        at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:185)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
        at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
        at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
        at java.lang.Thread.run(Thread.java:748)
I0203 14:13:44.585079 22023 impala-server.cc:2060] Catalog topic update applied 
with version: 6193 new min catalog object version: 2

... // After this the table test_load_nopart_beeswax_true from LoadData 
statement can be added
I0203 14:13:44.586282  4723 ImpaladCatalog.java:228] 
db4f57572baab787:ebdb853600000000] Adding: 
TABLE:test_async_load_898a2f19.test_load_nopart_beeswax_true version: 6207 
size: 3866 {noformat}
The bug is that table loading on HBase table should fail if catalogd fails to 
connect to HBase.

  was:
This is identified by an internal S3 build that doesn't launch HBase. There are 
some tests that still run queries on HBase tables, e.g. 
TestDdlStatements::test_alter_set_column_stats. But they don't fail on even if 
the table can't be correctly loaded. Catalogd logs show that the connection 
failure to HBase is ignored:
{noformat}
I0203 14:12:33.687620 20673 TableLoadingMgr.java:71] Loading metadata for 
table: functional_hbase.alltypes
I0203 14:12:33.687674 24282 TableLoader.java:76] Loading metadata for: 
functional_hbase.alltypes (background load)
I0203 14:12:33.687706 20673 TableLoadingMgr.java:73] Remaining items in queue: 
0. Loads in progress: 1
I0203 14:12:33.690941 26564 JniCatalog.java:257] execDdl request: DROP_DATABASE 
test_compute_stats_9c95c5d8 issued by jenkins
I0203 14:12:33.691668 24282 Table.java:218] createEventId_ for table: 
functional_hbase.alltypes set to: -1
......
W0203 14:13:06.941573  1978 ReadOnlyZKClient.java:193] 0x65bc7c50 to 
localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries 
= 30, give up
W0203 14:13:06.947460 24282 ConnectionImplementation.java:641] Retrieve cluster 
id failed
Java exception follows:
java.util.concurrent.ExecutionException: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/hbaseid
        at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at 
org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
        at 
org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
        at 
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
        at 
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
        at 
org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
        at 
org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
        at org.apache.impala.catalog.HBaseTable.load(HBaseTable.java:112)
        at org.apache.impala.catalog.TableLoader.load(TableLoader.java:144)
        at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:245)
        at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:242)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
        at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
        at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
        ... 1 more
I0203 14:13:07.058998 24282 TableLoader.java:175] Loaded metadata for: 
functional_hbase.alltypes (33371ms)
I0203 14:13:07.866829 21368 catalog-server.cc:403] A catalog update with 9 
entries is assembled. Catalog version: 6192 Last sent catalog version: 6181
I0203 14:13:07.870369 21344 catalog-server.cc:816] Collected update: 
1:TABLE:functional_hbase.alltypes, version=6193, original size=3855, compressed 
size=1471
I0203 14:13:07.872047 21344 catalog-server.cc:816] Collected update: 
1:CATALOG_SERVICE_ID, version=6193, original size=60, compressed 
size=58{noformat}
This is problematic since impalad thought the table is correctly loaded and 
will try to load it again when applying the catalog update, which could block 
the statestore subscriber thread for a long time, causing other DDL queries to 
be blocked as well since they can't acquire the catalog update lock.

We've seen TestAsyncLoadData.test_async_load timeout on S3 (IMPALA-11285) and 
this is the cause.

Here are logs showing impalad is blocked in applying catalog update of the 
HBase table:
{noformat}
I0203 14:13:09.359010  3636 Frontend.java:1917] 
db4f57572baab787:ebdb853600000000] Analyzing query: load data inpath 
'/test-warehouse/test_load_staging_beeswax_True'           into table 
test_async_load_898a2f19.test_load_nopart_beeswax_True db: functional
...
I0203 14:13:42.188225  4881 ClientCnxn.java:1246] Socket error occurred: 
localhost/0:0:0:0:0:0:0:1:2181: Connection refused
W0203 14:13:42.288529  4880 ReadOnlyZKClient.java:189] 0x43325be0 to 
localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries 
= 29
I0203 14:13:43.288617  4881 ClientCnxn.java:1111] Opening socket connection to 
server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL 
(unknown error)
I0203 14:13:43.288892  4881 ClientCnxn.java:1246] Socket error occurred: 
localhost/127.0.0.1:2181: Connection refused
W0203 14:13:43.389173  4880 ReadOnlyZKClient.java:189] 0x43325be0 to 
localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries 
= 30
I0203 14:13:44.389231  4881 ClientCnxn.java:1111] Opening socket connection to 
server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL 
(unknown error)
I0203 14:13:44.389554  4881 ClientCnxn.java:1246] Socket error occurred: 
localhost/127.0.0.1:2181: Connection refused
W0203 14:13:44.489856  4880 ReadOnlyZKClient.java:193] 0x43325be0 to 
localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries 
= 30, give up
W0203 14:13:44.500921 22023 ConnectionImplementation.java:641] Retrieve cluster 
id failed
Java exception follows:
java.util.concurrent.ExecutionException: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/hbaseid
        at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at 
org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
        at 
org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
        at 
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
        at 
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
        at 
org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
        at 
org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
        at 
org.apache.impala.catalog.HBaseTable.loadFromThrift(HBaseTable.java:139)
        at org.apache.impala.catalog.Table.fromThrift(Table.java:538)
        at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:474)
        at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:329)
        at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:258)
        at 
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
        at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:513)
        at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:185)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
        at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
        at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
        at java.lang.Thread.run(Thread.java:748)
I0203 14:13:44.585079 22023 impala-server.cc:2060] Catalog topic update applied 
with version: 6193 new min catalog object version: 2

... // After this the table test_load_nopart_beeswax_true from LoadData 
statement can be added
I0203 14:13:44.586282  4723 ImpaladCatalog.java:228] 
db4f57572baab787:ebdb853600000000] Adding: 
TABLE:test_async_load_898a2f19.test_load_nopart_beeswax_true version: 6207 
size: 3866 {noformat}
The bug is that table loading on HBase table should fail if catalogd fails to 
connect to HBase. It's true before this change (IMPALA-7322):

[https://gerrit.cloudera.org/c/13786/8/fe/src/main/java/org/apache/impala/catalog/HBaseTable.java]

After IMPALA-7322, the exception thrown by Util.getHBaseTable(hbaseTableName_) 
is ignored:
{code:java}
      try {
        hbaseTableName_ = Util.getHBaseTableName(getMetaStoreTable());
        // Warm up the connection and verify the table exists.
        Util.getHBaseTable(hbaseTableName_).close();
        columnFamilies_ = null;
        cols = Util.loadColumns(msTable_);
      } finally {
        storageMetadataLoadTime_ = storageLoadTimer.stop();
      } {code}


> HBaseTable still get loaded even if HBase is down
> -------------------------------------------------
>
>                 Key: IMPALA-12788
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12788
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 4.0.0, Impala 3.4.0, Impala 3.4.1, Impala 4.1.0, 
> Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, Impala 4.3.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> This is identified by an internal S3 build that doesn't launch HBase. There 
> are some tests that still run queries on HBase tables, e.g. 
> TestDdlStatements::test_alter_set_column_stats. But they don't fail on even 
> if the table can't be correctly loaded. Catalogd logs show that the 
> connection failure to HBase is ignored:
> {noformat}
> I0203 14:12:33.687620 20673 TableLoadingMgr.java:71] Loading metadata for 
> table: functional_hbase.alltypes
> I0203 14:12:33.687674 24282 TableLoader.java:76] Loading metadata for: 
> functional_hbase.alltypes (background load)
> I0203 14:12:33.687706 20673 TableLoadingMgr.java:73] Remaining items in 
> queue: 0. Loads in progress: 1
> I0203 14:12:33.690941 26564 JniCatalog.java:257] execDdl request: 
> DROP_DATABASE test_compute_stats_9c95c5d8 issued by jenkins
> I0203 14:12:33.691668 24282 Table.java:218] createEventId_ for table: 
> functional_hbase.alltypes set to: -1
> ......
> W0203 14:13:06.941573  1978 ReadOnlyZKClient.java:193] 0x65bc7c50 to 
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, 
> retries = 30, give up
> W0203 14:13:06.947460 24282 ConnectionImplementation.java:641] Retrieve 
> cluster id failed
> Java exception follows:
> java.util.concurrent.ExecutionException: 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /hbase/hbaseid
>         at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>         at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
>         at 
> org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
>         at 
> org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
>         at org.apache.impala.catalog.HBaseTable.load(HBaseTable.java:112)
>         at org.apache.impala.catalog.TableLoader.load(TableLoader.java:144)
>         at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:245)
>         at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:242)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>         at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
>         at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
>         ... 1 more
> I0203 14:13:07.058998 24282 TableLoader.java:175] Loaded metadata for: 
> functional_hbase.alltypes (33371ms)
> I0203 14:13:07.866829 21368 catalog-server.cc:403] A catalog update with 9 
> entries is assembled. Catalog version: 6192 Last sent catalog version: 6181
> I0203 14:13:07.870369 21344 catalog-server.cc:816] Collected update: 
> 1:TABLE:functional_hbase.alltypes, version=6193, original size=3855, 
> compressed size=1471
> I0203 14:13:07.872047 21344 catalog-server.cc:816] Collected update: 
> 1:CATALOG_SERVICE_ID, version=6193, original size=60, compressed 
> size=58{noformat}
> This is problematic since impalad thought the table is correctly loaded and 
> will try to load it again when applying the catalog update, which could block 
> the statestore subscriber thread for a long time, causing other DDL queries 
> to be blocked as well since they can't acquire the catalog update lock.
> We've seen TestAsyncLoadData.test_async_load timeout on S3 (IMPALA-11285) and 
> this is the cause.
> Here are logs showing impalad is blocked in applying catalog update of the 
> HBase table:
> {noformat}
> I0203 14:13:09.359010  3636 Frontend.java:1917] 
> db4f57572baab787:ebdb853600000000] Analyzing query: load data inpath 
> '/test-warehouse/test_load_staging_beeswax_True'           into table 
> test_async_load_898a2f19.test_load_nopart_beeswax_True db: functional
> ...
> I0203 14:13:42.188225  4881 ClientCnxn.java:1246] Socket error occurred: 
> localhost/0:0:0:0:0:0:0:1:2181: Connection refused
> W0203 14:13:42.288529  4880 ReadOnlyZKClient.java:189] 0x43325be0 to 
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, 
> retries = 29
> I0203 14:13:43.288617  4881 ClientCnxn.java:1111] Opening socket connection 
> to server localhost/127.0.0.1:2181. Will not attempt to authenticate using 
> SASL (unknown error)
> I0203 14:13:43.288892  4881 ClientCnxn.java:1246] Socket error occurred: 
> localhost/127.0.0.1:2181: Connection refused
> W0203 14:13:43.389173  4880 ReadOnlyZKClient.java:189] 0x43325be0 to 
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, 
> retries = 30
> I0203 14:13:44.389231  4881 ClientCnxn.java:1111] Opening socket connection 
> to server localhost/127.0.0.1:2181. Will not attempt to authenticate using 
> SASL (unknown error)
> I0203 14:13:44.389554  4881 ClientCnxn.java:1246] Socket error occurred: 
> localhost/127.0.0.1:2181: Connection refused
> W0203 14:13:44.489856  4880 ReadOnlyZKClient.java:193] 0x43325be0 to 
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, 
> retries = 30, give up
> W0203 14:13:44.500921 22023 ConnectionImplementation.java:641] Retrieve 
> cluster id failed
> Java exception follows:
> java.util.concurrent.ExecutionException: 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /hbase/hbaseid
>         at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>         at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
>         at 
> org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
>         at 
> org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
>         at 
> org.apache.impala.catalog.HBaseTable.loadFromThrift(HBaseTable.java:139)
>         at org.apache.impala.catalog.Table.fromThrift(Table.java:538)
>         at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:474)
>         at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:329)
>         at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:258)
>         at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
>         at 
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:513)
>         at 
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:185)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>         at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
>         at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
>         at java.lang.Thread.run(Thread.java:748)
> I0203 14:13:44.585079 22023 impala-server.cc:2060] Catalog topic update 
> applied with version: 6193 new min catalog object version: 2
> ... // After this the table test_load_nopart_beeswax_true from LoadData 
> statement can be added
> I0203 14:13:44.586282  4723 ImpaladCatalog.java:228] 
> db4f57572baab787:ebdb853600000000] Adding: 
> TABLE:test_async_load_898a2f19.test_load_nopart_beeswax_true version: 6207 
> size: 3866 {noformat}
> The bug is that table loading on HBase table should fail if catalogd fails to 
> connect to HBase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (IMPALA-12788) HBaseTable still get loaded even if HBase is down

Reply via email to