[ https://issues.apache.org/jira/browse/PHOENIX-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani resolved PHOENIX-7233. ----------------------------------- Resolution: Won't Fix With HBASE-28428 resolved, we should no longer need PHOENIX-7233. > CQSI openConnection should timeout to unblock other connection threads > ---------------------------------------------------------------------- > > Key: PHOENIX-7233 > URL: https://issues.apache.org/jira/browse/PHOENIX-7233 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 5.1.3 > Reporter: Viraj Jasani > Priority: Major > > PhoenixDriver initializes and caches ConnectionQueryServices objects with > connectionQueryServicesCache. As part of the CQSI initialization, connection > is opened with HBase server by using HBase client provided ConnectionFactory, > which provides Connection object to the client. The Connection object > provided by HBase allows clients to share Zookeeper connection, meta cache as > well as remote connections to regionservers and master daemons. The > Connection object is used to perform Table CRUD operations as well as > Administrative actions on the cluster. > HBase Connection object initialization requires ClusterId, which is > maintained either in Zookeeper or Master daemons (or both) and retrieved by > client depending on whether the client is configured to use > ZKConnectionRegistry or MasterRegistry/RpcConnectionRegistry. > For ZKConnectionRegistry, we have run into an edge case wherein the > connection to Zookeeper server got stuck for more than 12 hours. When the > client tried to create connection to Zookeeper quorum to retrieve the > ClusterId, Zookeeper leader was switched from one server to another. While > the leader switch event resulting into stuck connection requires RCA, it is > not appropriate for Phoenix/HBase client to indefinitely wait for the > response from Zookeeper without any connection timeout. > For Phoenix client, if one thread is stuck in opening connection during > CQSI#init, all other threads trying to create connections would get stuck > because we take class level lock before opening the connection, leading to > all threads getting stuck and potential termination or degradation of the > client JVM. > While HBase client should also use timeout, however not having timeout from > Phoenix client side has far worse complications. As part of this Jira, we > should introduce a way for CQSI#openConnection to timeout, either by using > CompletableFuture API or using our preconfigured thread-pool. > > Stacktrace for reference: > > {code:java} > jdk.internal.misc.Unsafe.park > java.util.concurrent.locks.LockSupport.park > java.util.concurrent.CompletableFuture$Signaller.block > java.util.concurrent.ForkJoinPool.managedBlock > java.util.concurrent.CompletableFuture.waitingGet > java.util.concurrent.CompletableFuture.get > org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId > org.apache.hadoop.hbase.client.ConnectionImplementation.<init> > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance? > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance > jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance > java.lang.reflect.Constructor.newInstance > org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$? > org.apache.hadoop.hbase.client.ConnectionFactory$$Lambda$?.run > java.security.AccessController.doPrivileged > javax.security.auth.Subject.doAs > org.apache.hadoop.security.UserGroupInformation.doAs > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection > org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection > org.apache.phoenix.query.ConnectionQueryServicesImpl.access$? > org.apache.phoenix.query.ConnectionQueryServicesImpl$?.call > org.apache.phoenix.query.ConnectionQueryServicesImpl$?.call > org.apache.phoenix.util.PhoenixContextExecutor.call > org.apache.phoenix.query.ConnectionQueryServicesImpl.init > org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices > org.apache.phoenix.jdbc.HighAvailabilityGroup.connectToOneCluster > org.apache.phoenix.jdbc.ParallelPhoenixConnection.getConnection > org.apache.phoenix.jdbc.ParallelPhoenixConnection.lambda$new$? > org.apache.phoenix.jdbc.ParallelPhoenixConnection$$Lambda$?.get > org.apache.phoenix.jdbc.ParallelPhoenixContext.lambda$chainOnConnClusterContext$? > org.apache.phoenix.jdbc.ParallelPhoenixContext$$Lambda$?.apply {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)