[
https://issues.apache.org/jira/browse/IMPALA-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang updated IMPALA-12788:
------------------------------------
Affects Version/s: Impala 4.3.0
Impala 4.1.2
Impala 4.1.1
Impala 4.2.0
Impala 4.1.0
Impala 3.4.1
Impala 3.4.0
Impala 4.0.0
> HBaseTable still get loaded even if HBase is down
> -------------------------------------------------
>
> Key: IMPALA-12788
> URL: https://issues.apache.org/jira/browse/IMPALA-12788
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 4.0.0, Impala 3.4.0, Impala 3.4.1, Impala 4.1.0,
> Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, Impala 4.3.0
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Fix For: Impala 4.4.0
>
>
> This is identified by an internal S3 build that doesn't launch HBase. There
> are some tests that still run queries on HBase tables, e.g.
> TestDdlStatements::test_alter_set_column_stats. But they don't fail on even
> if the table can't be correctly loaded. Catalogd logs show that the
> connection failure to HBase is ignored:
> {noformat}
> I0203 14:12:33.687620 20673 TableLoadingMgr.java:71] Loading metadata for
> table: functional_hbase.alltypes
> I0203 14:12:33.687674 24282 TableLoader.java:76] Loading metadata for:
> functional_hbase.alltypes (background load)
> I0203 14:12:33.687706 20673 TableLoadingMgr.java:73] Remaining items in
> queue: 0. Loads in progress: 1
> I0203 14:12:33.690941 26564 JniCatalog.java:257] execDdl request:
> DROP_DATABASE test_compute_stats_9c95c5d8 issued by jenkins
> I0203 14:12:33.691668 24282 Table.java:218] createEventId_ for table:
> functional_hbase.alltypes set to: -1
> ......
> W0203 14:13:06.941573 1978 ReadOnlyZKClient.java:193] 0x65bc7c50 to
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS,
> retries = 30, give up
> W0203 14:13:06.947460 24282 ConnectionImplementation.java:641] Retrieve
> cluster id failed
> Java exception follows:
> java.util.concurrent.ExecutionException:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
> = ConnectionLoss for /hbase/hbaseid
> at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
> at
> org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
> at
> org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
> at org.apache.impala.catalog.HBaseTable.load(HBaseTable.java:112)
> at org.apache.impala.catalog.TableLoader.load(TableLoader.java:144)
> at
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:245)
> at
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:242)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
> at
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
> at
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
> ... 1 more
> I0203 14:13:07.058998 24282 TableLoader.java:175] Loaded metadata for:
> functional_hbase.alltypes (33371ms)
> I0203 14:13:07.866829 21368 catalog-server.cc:403] A catalog update with 9
> entries is assembled. Catalog version: 6192 Last sent catalog version: 6181
> I0203 14:13:07.870369 21344 catalog-server.cc:816] Collected update:
> 1:TABLE:functional_hbase.alltypes, version=6193, original size=3855,
> compressed size=1471
> I0203 14:13:07.872047 21344 catalog-server.cc:816] Collected update:
> 1:CATALOG_SERVICE_ID, version=6193, original size=60, compressed
> size=58{noformat}
> This is problematic since impalad thought the table is correctly loaded and
> will try to load it again when applying the catalog update, which could block
> the statestore subscriber thread for a long time, causing other DDL queries
> to be blocked as well since they can't acquire the catalog update lock.
> We've seen TestAsyncLoadData.test_async_load timeout on S3 (IMPALA-11285) and
> this is the cause.
> Here are logs showing impalad is blocked in applying catalog update of the
> HBase table:
> {noformat}
> I0203 14:13:09.359010 3636 Frontend.java:1917]
> db4f57572baab787:ebdb853600000000] Analyzing query: load data inpath
> '/test-warehouse/test_load_staging_beeswax_True' into table
> test_async_load_898a2f19.test_load_nopart_beeswax_True db: functional
> ...
> I0203 14:13:42.188225 4881 ClientCnxn.java:1246] Socket error occurred:
> localhost/0:0:0:0:0:0:0:1:2181: Connection refused
> W0203 14:13:42.288529 4880 ReadOnlyZKClient.java:189] 0x43325be0 to
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS,
> retries = 29
> I0203 14:13:43.288617 4881 ClientCnxn.java:1111] Opening socket connection
> to server localhost/127.0.0.1:2181. Will not attempt to authenticate using
> SASL (unknown error)
> I0203 14:13:43.288892 4881 ClientCnxn.java:1246] Socket error occurred:
> localhost/127.0.0.1:2181: Connection refused
> W0203 14:13:43.389173 4880 ReadOnlyZKClient.java:189] 0x43325be0 to
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS,
> retries = 30
> I0203 14:13:44.389231 4881 ClientCnxn.java:1111] Opening socket connection
> to server localhost/127.0.0.1:2181. Will not attempt to authenticate using
> SASL (unknown error)
> I0203 14:13:44.389554 4881 ClientCnxn.java:1246] Socket error occurred:
> localhost/127.0.0.1:2181: Connection refused
> W0203 14:13:44.489856 4880 ReadOnlyZKClient.java:193] 0x43325be0 to
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS,
> retries = 30, give up
> W0203 14:13:44.500921 22023 ConnectionImplementation.java:641] Retrieve
> cluster id failed
> Java exception follows:
> java.util.concurrent.ExecutionException:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
> = ConnectionLoss for /hbase/hbaseid
> at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
> at
> org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
> at
> org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
> at
> org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
> at
> org.apache.impala.catalog.HBaseTable.loadFromThrift(HBaseTable.java:139)
> at org.apache.impala.catalog.Table.fromThrift(Table.java:538)
> at
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:474)
> at
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:329)
> at
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:258)
> at
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
> at
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:513)
> at
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:185)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
> at
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
> at
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
> at java.lang.Thread.run(Thread.java:748)
> I0203 14:13:44.585079 22023 impala-server.cc:2060] Catalog topic update
> applied with version: 6193 new min catalog object version: 2
> ... // After this the table test_load_nopart_beeswax_true from LoadData
> statement can be added
> I0203 14:13:44.586282 4723 ImpaladCatalog.java:228]
> db4f57572baab787:ebdb853600000000] Adding:
> TABLE:test_async_load_898a2f19.test_load_nopart_beeswax_true version: 6207
> size: 3866 {noformat}
> The bug is that table loading on HBase table should fail if catalogd fails to
> connect to HBase.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]