[
https://issues.apache.org/jira/browse/HIVE-26669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625051#comment-17625051
]
Sandeep Gade commented on HIVE-26669:
-------------------------------------
Hi Chris,
Thanks for you response. I have uploaded the thread dump, Please take a look.
At the time of issue, we saw only 14 ESTABLISHED connections to backend
database from HikariCP as shown below. Ideally we see 400 ESTABLISHED
connections as per our config setup during normal working of HMS/DB.
[root@pwdmmipphmetastore_server1.com ~]# netstat -plant |grep <269633-metastore
ps id> | grep <1111- db port>
tcp 0 0 pwdmmipphmetastore_server1.com:11007
database-server1.com:1111 ESTABLISHED 269633/java
tcp 0 0 pwdmmipphmetastore_server1.com:12003
database-server1.com:1111 ESTABLISHED 269633/java
tcp 0 0 pwdmmipphmetastore_server1.com:14930
database-server1.com:1111 ESTABLISHED 269633/java
tcp 0 0 pwdmmipphmetastore_server1.com:58094
database-server1.com:1111 ESTABLISHED 269633/java
tcp 0 0 pwdmmipphmetastore_server1.com:11303
database-server1.com:1111 ESTABLISHED 269633/java
tcp 0 0 pwdmmipphmetastore_server1.com:13200
database-server2:1111 ESTABLISHED 269633/java
tcp 0 0 pwdmmipphmetastore_server1.com:58088
database-server1.com:1111 ESTABLISHED 269633/java
tcp 0 0 pwdmmipphmetastore_server1.com:11607
database-server1.com:1111 ESTABLISHED 269633/java
tcp 0 0 pwdmmipphmetastore_server1.com:58084
database-server1.com:1111 ESTABLISHED 269633/java
tcp 0 0 pwdmmipphmetastore_server1.com:58096
database-server1.com:1111 ESTABLISHED 269633/java
tcp 0 0 pwdmmipphmetastore_server1.com:11515
database-server1.com:1111 ESTABLISHED 269633/java
tcp 0 0 pwdmmipphmetastore_server1.com:11311
database-server1.com:1111 ESTABLISHED 269633/java
tcp 0 0 pwdmmipphmetastore_server1.com:11507
database-server1.com:1111 ESTABLISHED 269633/java
tcp 0 0 pwdmmipphmetastore_server1.com:11565
database-server1.com:1111 ESTABLISHED 269633/java
tcp 26 0 pwdmmipphmetastore_server1.com:9083
hiv008-server:61521 CLOSE_WAIT 269633/java
>From the Thread dump we also see there are 4 Hikari connection adder threads
>running, I have added full stack for one of those 4 here. Looks like it is
>waiting on 'java.net.SocketInputStream.socketRead0(Native Method)'.
"HikariPool-4 connection adder" #3567048 daemon prio=5 os_prio=0
tid=0x00007f992c309000 nid=0x27c93 runnable [0x00007f99a5d07000]
"HikariPool-3 connection adder" #3566942 daemon prio=5 os_prio=0
tid=0x0000558b9f60f000 nid=0x278c5 runnable [0x00007f99b1bc5000]
"HikariPool-2 connection adder" #3566835 daemon prio=5 os_prio=0
tid=0x00007f9978b81800 nid=0x276e2 runnable [0x00007f999c16a000]
"HikariPool-1 connection adder" #3566622 daemon prio=5 os_prio=0
tid=0x00007f99d26db000 nid=0x2702d runnable [0x00007f999ee97000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at oracle.net.ns.Packet.receive(Packet.java:311)
at oracle.net.ns.DataPacket.receive(DataPacket.java:105)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:305)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:249)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:171)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:89)
at
oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
at
oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
at
oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:429)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:397)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:257)
at
oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:437)
at
oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:954)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:639)
at
oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:666)
at
oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:566)
at
com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:95)
at
com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:101)
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:356)
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:199)
at
com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:444)
at com.zaxxer.hikari.pool.HikariPool.access$200(HikariPool.java:71)
at
com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:631)
at
com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:617)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Locked ownable synchronizers:
- <0x00007f9ba8051a68> (a
java.util.concurrent.ThreadPoolExecutor$Worker)
Looking at db logs, we do not see any issues there. So is it possibly due to
network or something else.
Thank you!
> Hive Metastore become unresponsive
> ----------------------------------
>
> Key: HIVE-26669
> URL: https://issues.apache.org/jira/browse/HIVE-26669
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Affects Versions: 3.1.0
> Reporter: Sandeep Gade
> Priority: Critical
> Attachments: metastore-server1
>
>
> We are experiencing issues with Hive Metastore where it goes unresponsive.
> Initial investigation shows thousands of thread in WAITING (parking) state as
> shown below:
> 1 java.lang.Thread.State: BLOCKED (on object monitor)
> 772 java.lang.Thread.State: RUNNABLE
> 2 java.lang.Thread.State: TIMED_WAITING (on object monitor)
> 13 java.lang.Thread.State: TIMED_WAITING (parking)
> 5 java.lang.Thread.State: TIMED_WAITING (sleeping)
> 3 java.lang.Thread.State: WAITING (on object monitor)
> 14308 java.lang.Thread.State: WAITING (parking)
> ==============
> Almost all of the threads are stuck at 'parking to wait for
> <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)'
>
> 15 - parking to wait for <0x00007f9ad06c9c10> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 14288 - parking to wait for <0x00007f9ad0795c48> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> 1 - parking to wait for <0x00007f9ad0a161f8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad0a39248> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad0adb0a0> (a
> java.util.concurrent.SynchronousQueue$TransferQueue)
> 5 - parking to wait for <0x00007f9ad0b12278> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad0b12518> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad0b44878> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad0cbe8f0> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad1318d60> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad1478c10> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 5 - parking to wait for <0x00007f9ad1494ff8> (a
> java.util.concurrent.SynchronousQueue$TransferQueue)
> ======================
> complete stack:
> "pool-8-thread-62238" #3582305 prio=5 os_prio=0 tid=0x00007f977bfc9800
> nid=0x62011 waiting on condition [0x00007f959d917000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00007f9ad0795c48> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> at
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:351)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
> at
> org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:59)
> at
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:750)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:718)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:712)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1488)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1470)
> at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
> at com.sun.proxy.$Proxy30.get_database(Unknown Source)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:15014)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:14998)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
> at
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Locked ownable synchronizers:
> - <0x00007fae9f0d8c20> (a
> java.util.concurrent.ThreadPoolExecutor$Worker)
> ======================
> Looking at linux process, Hive exhausts its 'max processes count' while the
> issue is happening
> set to:
> Max processes 16000 16000 processes
> As a workaround, we restart Metastores and it works fine for few days.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)