[ 
https://issues.apache.org/jira/browse/IMPALA-12788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12788:
------------------------------------
    Affects Version/s: Impala 4.3.0
                       Impala 4.1.2
                       Impala 4.1.1
                       Impala 4.2.0
                       Impala 4.1.0
                       Impala 3.4.1
                       Impala 3.4.0
                       Impala 4.0.0

> HBaseTable still get loaded even if HBase is down
> -------------------------------------------------
>
>                 Key: IMPALA-12788
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12788
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 4.0.0, Impala 3.4.0, Impala 3.4.1, Impala 4.1.0, 
> Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, Impala 4.3.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>             Fix For: Impala 4.4.0
>
>
> This is identified by an internal S3 build that doesn't launch HBase. There 
> are some tests that still run queries on HBase tables, e.g. 
> TestDdlStatements::test_alter_set_column_stats. But they don't fail on even 
> if the table can't be correctly loaded. Catalogd logs show that the 
> connection failure to HBase is ignored:
> {noformat}
> I0203 14:12:33.687620 20673 TableLoadingMgr.java:71] Loading metadata for 
> table: functional_hbase.alltypes
> I0203 14:12:33.687674 24282 TableLoader.java:76] Loading metadata for: 
> functional_hbase.alltypes (background load)
> I0203 14:12:33.687706 20673 TableLoadingMgr.java:73] Remaining items in 
> queue: 0. Loads in progress: 1
> I0203 14:12:33.690941 26564 JniCatalog.java:257] execDdl request: 
> DROP_DATABASE test_compute_stats_9c95c5d8 issued by jenkins
> I0203 14:12:33.691668 24282 Table.java:218] createEventId_ for table: 
> functional_hbase.alltypes set to: -1
> ......
> W0203 14:13:06.941573  1978 ReadOnlyZKClient.java:193] 0x65bc7c50 to 
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, 
> retries = 30, give up
> W0203 14:13:06.947460 24282 ConnectionImplementation.java:641] Retrieve 
> cluster id failed
> Java exception follows:
> java.util.concurrent.ExecutionException: 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /hbase/hbaseid
>         at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>         at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
>         at 
> org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
>         at 
> org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
>         at org.apache.impala.catalog.HBaseTable.load(HBaseTable.java:112)
>         at org.apache.impala.catalog.TableLoader.load(TableLoader.java:144)
>         at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:245)
>         at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:242)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>         at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
>         at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
>         ... 1 more
> I0203 14:13:07.058998 24282 TableLoader.java:175] Loaded metadata for: 
> functional_hbase.alltypes (33371ms)
> I0203 14:13:07.866829 21368 catalog-server.cc:403] A catalog update with 9 
> entries is assembled. Catalog version: 6192 Last sent catalog version: 6181
> I0203 14:13:07.870369 21344 catalog-server.cc:816] Collected update: 
> 1:TABLE:functional_hbase.alltypes, version=6193, original size=3855, 
> compressed size=1471
> I0203 14:13:07.872047 21344 catalog-server.cc:816] Collected update: 
> 1:CATALOG_SERVICE_ID, version=6193, original size=60, compressed 
> size=58{noformat}
> This is problematic since impalad thought the table is correctly loaded and 
> will try to load it again when applying the catalog update, which could block 
> the statestore subscriber thread for a long time, causing other DDL queries 
> to be blocked as well since they can't acquire the catalog update lock.
> We've seen TestAsyncLoadData.test_async_load timeout on S3 (IMPALA-11285) and 
> this is the cause.
> Here are logs showing impalad is blocked in applying catalog update of the 
> HBase table:
> {noformat}
> I0203 14:13:09.359010  3636 Frontend.java:1917] 
> db4f57572baab787:ebdb853600000000] Analyzing query: load data inpath 
> '/test-warehouse/test_load_staging_beeswax_True'           into table 
> test_async_load_898a2f19.test_load_nopart_beeswax_True db: functional
> ...
> I0203 14:13:42.188225  4881 ClientCnxn.java:1246] Socket error occurred: 
> localhost/0:0:0:0:0:0:0:1:2181: Connection refused
> W0203 14:13:42.288529  4880 ReadOnlyZKClient.java:189] 0x43325be0 to 
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, 
> retries = 29
> I0203 14:13:43.288617  4881 ClientCnxn.java:1111] Opening socket connection 
> to server localhost/127.0.0.1:2181. Will not attempt to authenticate using 
> SASL (unknown error)
> I0203 14:13:43.288892  4881 ClientCnxn.java:1246] Socket error occurred: 
> localhost/127.0.0.1:2181: Connection refused
> W0203 14:13:43.389173  4880 ReadOnlyZKClient.java:189] 0x43325be0 to 
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, 
> retries = 30
> I0203 14:13:44.389231  4881 ClientCnxn.java:1111] Opening socket connection 
> to server localhost/127.0.0.1:2181. Will not attempt to authenticate using 
> SASL (unknown error)
> I0203 14:13:44.389554  4881 ClientCnxn.java:1246] Socket error occurred: 
> localhost/127.0.0.1:2181: Connection refused
> W0203 14:13:44.489856  4880 ReadOnlyZKClient.java:193] 0x43325be0 to 
> localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, 
> retries = 30, give up
> W0203 14:13:44.500921 22023 ConnectionImplementation.java:641] Retrieve 
> cluster id failed
> Java exception follows:
> java.util.concurrent.ExecutionException: 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /hbase/hbaseid
>         at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>         at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
>         at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
>         at 
> org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
>         at 
> org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
>         at 
> org.apache.impala.catalog.HBaseTable.loadFromThrift(HBaseTable.java:139)
>         at org.apache.impala.catalog.Table.fromThrift(Table.java:538)
>         at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:474)
>         at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:329)
>         at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:258)
>         at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
>         at 
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:513)
>         at 
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:185)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
>         at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
>         at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
>         at java.lang.Thread.run(Thread.java:748)
> I0203 14:13:44.585079 22023 impala-server.cc:2060] Catalog topic update 
> applied with version: 6193 new min catalog object version: 2
> ... // After this the table test_load_nopart_beeswax_true from LoadData 
> statement can be added
> I0203 14:13:44.586282  4723 ImpaladCatalog.java:228] 
> db4f57572baab787:ebdb853600000000] Adding: 
> TABLE:test_async_load_898a2f19.test_load_nopart_beeswax_true version: 6207 
> size: 3866 {noformat}
> The bug is that table loading on HBase table should fail if catalogd fails to 
> connect to HBase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to