[ 
https://issues.apache.org/jira/browse/HIVE-26669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625051#comment-17625051
 ] 

Sandeep Gade commented on HIVE-26669:
-------------------------------------

Hi Chris, 
Thanks for you response. I have uploaded the thread dump, Please take a look. 

At the time of issue, we saw only 14 ESTABLISHED connections to backend 
database from HikariCP as shown below. Ideally we see 400 ESTABLISHED 
connections as per our config setup during normal working of HMS/DB.

[root@pwdmmipphmetastore_server1.com ~]# netstat -plant |grep <269633-metastore 
ps id> | grep <1111- db port>
tcp        0      0 pwdmmipphmetastore_server1.com:11007      
database-server1.com:1111     ESTABLISHED 269633/java
tcp        0      0 pwdmmipphmetastore_server1.com:12003      
database-server1.com:1111     ESTABLISHED 269633/java
tcp        0      0 pwdmmipphmetastore_server1.com:14930      
database-server1.com:1111     ESTABLISHED 269633/java
tcp        0      0 pwdmmipphmetastore_server1.com:58094      
database-server1.com:1111     ESTABLISHED 269633/java
tcp        0      0 pwdmmipphmetastore_server1.com:11303      
database-server1.com:1111     ESTABLISHED 269633/java
tcp        0      0 pwdmmipphmetastore_server1.com:13200      
database-server2:1111     ESTABLISHED 269633/java
tcp        0      0 pwdmmipphmetastore_server1.com:58088      
database-server1.com:1111     ESTABLISHED 269633/java
tcp        0      0 pwdmmipphmetastore_server1.com:11607      
database-server1.com:1111     ESTABLISHED 269633/java
tcp        0      0 pwdmmipphmetastore_server1.com:58084      
database-server1.com:1111     ESTABLISHED 269633/java
tcp        0      0 pwdmmipphmetastore_server1.com:58096      
database-server1.com:1111     ESTABLISHED 269633/java
tcp        0      0 pwdmmipphmetastore_server1.com:11515      
database-server1.com:1111     ESTABLISHED 269633/java
tcp        0      0 pwdmmipphmetastore_server1.com:11311      
database-server1.com:1111     ESTABLISHED 269633/java
tcp        0      0 pwdmmipphmetastore_server1.com:11507      
database-server1.com:1111     ESTABLISHED 269633/java
tcp        0      0 pwdmmipphmetastore_server1.com:11565      
database-server1.com:1111     ESTABLISHED 269633/java
tcp       26      0 pwdmmipphmetastore_server1.com:9083      
hiv008-server:61521     CLOSE_WAIT  269633/java


>From the Thread dump we also see there are 4 Hikari connection adder threads 
>running, I have added full stack for one of those 4 here. Looks like it is 
>waiting on 'java.net.SocketInputStream.socketRead0(Native Method)'. 


"HikariPool-4 connection adder" #3567048 daemon prio=5 os_prio=0 
tid=0x00007f992c309000 nid=0x27c93 runnable [0x00007f99a5d07000]
"HikariPool-3 connection adder" #3566942 daemon prio=5 os_prio=0 
tid=0x0000558b9f60f000 nid=0x278c5 runnable [0x00007f99b1bc5000]
"HikariPool-2 connection adder" #3566835 daemon prio=5 os_prio=0 
tid=0x00007f9978b81800 nid=0x276e2 runnable [0x00007f999c16a000]
"HikariPool-1 connection adder" #3566622 daemon prio=5 os_prio=0 
tid=0x00007f99d26db000 nid=0x2702d runnable [0x00007f999ee97000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at oracle.net.ns.Packet.receive(Packet.java:311)
        at oracle.net.ns.DataPacket.receive(DataPacket.java:105)
        at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:305)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:249)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:171)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:89)
        at 
oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
        at 
oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
        at 
oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:429)
        at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:397)
        at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:257)
        at 
oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:437)
        at 
oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:954)
        at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:639)
        at 
oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:666)
        at 
oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
        at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:566)
        at 
com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:95)
        at 
com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:101)
        at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:356)
        at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:199)
        at 
com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:444)
        at com.zaxxer.hikari.pool.HikariPool.access$200(HikariPool.java:71)
        at 
com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:631)
        at 
com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:617)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

   Locked ownable synchronizers:
        - <0x00007f9ba8051a68> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)


Looking at db logs, we do not see any issues there. So is it possibly due to 
network or something else.

Thank you!

> Hive Metastore become unresponsive
> ----------------------------------
>
>                 Key: HIVE-26669
>                 URL: https://issues.apache.org/jira/browse/HIVE-26669
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 3.1.0
>            Reporter: Sandeep Gade
>            Priority: Critical
>         Attachments: metastore-server1
>
>
> We are experiencing issues with Hive Metastore where it goes unresponsive. 
> Initial investigation shows thousands of thread in WAITING (parking) state as 
> shown below:
>     1    java.lang.Thread.State: BLOCKED (on object monitor)
>     772    java.lang.Thread.State: RUNNABLE
>       2    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>      13    java.lang.Thread.State: TIMED_WAITING (parking)
>       5    java.lang.Thread.State: TIMED_WAITING (sleeping)
>       3    java.lang.Thread.State: WAITING (on object monitor)
>   14308    java.lang.Thread.State: WAITING (parking)
> ==============
> Almost all of the threads are stuck at 'parking to wait for  
> <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)'
>  
>      15         - parking to wait for  <0x00007f9ad06c9c10> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   14288         - parking to wait for  <0x00007f9ad0795c48> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>       1         - parking to wait for  <0x00007f9ad0a161f8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad0a39248> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad0adb0a0> (a 
> java.util.concurrent.SynchronousQueue$TransferQueue)
>       5         - parking to wait for  <0x00007f9ad0b12278> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad0b12518> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad0b44878> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad0cbe8f0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad1318d60> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad1478c10> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       5         - parking to wait for  <0x00007f9ad1494ff8> (a 
> java.util.concurrent.SynchronousQueue$TransferQueue)
> ======================
> complete stack:
> "pool-8-thread-62238" #3582305 prio=5 os_prio=0 tid=0x00007f977bfc9800 
> nid=0x62011 waiting on condition [0x00007f959d917000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007f9ad0795c48> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>         at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>         at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:351)
>         at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77)
>         at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
>         at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:59)
>         at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:750)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:718)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:712)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1488)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1470)
>         at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>         at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>         at com.sun.proxy.$Proxy30.get_database(Unknown Source)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:15014)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:14998)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>         at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
>         at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>         at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
>         at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750)
>    Locked ownable synchronizers:
>         - <0x00007fae9f0d8c20> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> ======================
> Looking at linux process, Hive exhausts its 'max processes count' while the 
> issue is happening
> set to:
> Max processes             16000                16000                processes
> As a workaround, we restart Metastores and it works fine for few days.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to