[
https://issues.apache.org/jira/browse/HIVE-26669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sandeep Gade updated HIVE-26669:
--------------------------------
Description:
We are experiencing issues with Hive Metastore where it goes unresponsive.
Initial investigation shows thousands of thread in WAITING (parking) state as
shown below:
1 java.lang.Thread.State: BLOCKED (on object monitor)
772 java.lang.Thread.State: RUNNABLE
2 java.lang.Thread.State: TIMED_WAITING (on object monitor)
13 java.lang.Thread.State: TIMED_WAITING (parking)
5 java.lang.Thread.State: TIMED_WAITING (sleeping)
3 java.lang.Thread.State: WAITING (on object monitor)
14308 java.lang.Thread.State: WAITING (parking)
==============
Almost all of the threads are stuck at 'parking to wait for
<0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)'
15 - parking to wait for <0x00007f9ad06c9c10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
14288 - parking to wait for <0x00007f9ad0795c48> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
1 - parking to wait for <0x00007f9ad0a161f8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad0a39248> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad0adb0a0> (a
java.util.concurrent.SynchronousQueue$TransferQueue)
5 - parking to wait for <0x00007f9ad0b12278> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad0b12518> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad0b44878> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad0cbe8f0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad1318d60> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad1478c10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
5 - parking to wait for <0x00007f9ad1494ff8> (a
java.util.concurrent.SynchronousQueue$TransferQueue)
======================
complete stack:
"pool-8-thread-62238" #3582305 prio=5 os_prio=0 tid=0x00007f977bfc9800
nid=0x62011 waiting on condition [0x00007f959d917000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007f9ad0795c48> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:351)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
at
org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:59)
at
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:750)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:718)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:712)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1488)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1470)
at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy30.get_database(Unknown Source)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:15014)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:14998)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
at
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Locked ownable synchronizers:
- <0x00007fae9f0d8c20> (a
java.util.concurrent.ThreadPoolExecutor$Worker)
======================
Looking at linux process, Hive exhausts its 'max processes count' while the
issue is happening
set to:
Max processes 16000 16000 processes
As a workaround, we restart Metastores and it works fine for few days.
was:
We are experiencing issues with Hive Metastore where it goes unresponsive.
Initial investigation shows thousands of thread in WAITING (parking) state as
shown below:
1 java.lang.Thread.State: BLOCKED (on object monitor)
772 java.lang.Thread.State: RUNNABLE
2 java.lang.Thread.State: TIMED_WAITING (on object monitor)
13 java.lang.Thread.State: TIMED_WAITING (parking)
5 java.lang.Thread.State: TIMED_WAITING (sleeping)
3 java.lang.Thread.State: WAITING (on object monitor)
14308 java.lang.Thread.State: WAITING (parking)
==============
All most all of the threads are stuck at 'parking to wait for
<0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)'
15 - parking to wait for <0x00007f9ad06c9c10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
14288 - parking to wait for <0x00007f9ad0795c48> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
1 - parking to wait for <0x00007f9ad0a161f8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad0a39248> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad0adb0a0> (a
java.util.concurrent.SynchronousQueue$TransferQueue)
5 - parking to wait for <0x00007f9ad0b12278> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad0b12518> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad0b44878> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad0cbe8f0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad1318d60> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00007f9ad1478c10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
5 - parking to wait for <0x00007f9ad1494ff8> (a
java.util.concurrent.SynchronousQueue$TransferQueue)
======================
complete stack:
"pool-8-thread-62238" #3582305 prio=5 os_prio=0 tid=0x00007f977bfc9800
nid=0x62011 waiting on condition [0x00007f959d917000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007f9ad0795c48> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:351)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
at
org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:59)
at
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:750)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:718)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:712)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1488)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1470)
at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy30.get_database(Unknown Source)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:15014)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:14998)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
at
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Locked ownable synchronizers:
- <0x00007fae9f0d8c20> (a
java.util.concurrent.ThreadPoolExecutor$Worker)
======================
Looking at linux process, Hive exhausts its 'max processes count' while the
issue is happening
set to:
Max processes 16000 16000 processes
As a workaround, we restart Metastores and it works fine for few days.
> Hive Metastore become unresponsive
> ----------------------------------
>
> Key: HIVE-26669
> URL: https://issues.apache.org/jira/browse/HIVE-26669
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Affects Versions: 3.1.0
> Reporter: Sandeep Gade
> Priority: Critical
>
> We are experiencing issues with Hive Metastore where it goes unresponsive.
> Initial investigation shows thousands of thread in WAITING (parking) state as
> shown below:
> 1 java.lang.Thread.State: BLOCKED (on object monitor)
> 772 java.lang.Thread.State: RUNNABLE
> 2 java.lang.Thread.State: TIMED_WAITING (on object monitor)
> 13 java.lang.Thread.State: TIMED_WAITING (parking)
> 5 java.lang.Thread.State: TIMED_WAITING (sleeping)
> 3 java.lang.Thread.State: WAITING (on object monitor)
> 14308 java.lang.Thread.State: WAITING (parking)
> ==============
> Almost all of the threads are stuck at 'parking to wait for
> <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)'
>
> 15 - parking to wait for <0x00007f9ad06c9c10> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 14288 - parking to wait for <0x00007f9ad0795c48> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> 1 - parking to wait for <0x00007f9ad0a161f8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad0a39248> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad0adb0a0> (a
> java.util.concurrent.SynchronousQueue$TransferQueue)
> 5 - parking to wait for <0x00007f9ad0b12278> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad0b12518> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad0b44878> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad0cbe8f0> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad1318d60> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 1 - parking to wait for <0x00007f9ad1478c10> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 5 - parking to wait for <0x00007f9ad1494ff8> (a
> java.util.concurrent.SynchronousQueue$TransferQueue)
> ======================
> complete stack:
> "pool-8-thread-62238" #3582305 prio=5 os_prio=0 tid=0x00007f977bfc9800
> nid=0x62011 waiting on condition [0x00007f959d917000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00007f9ad0795c48> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> at
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:351)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
> at
> org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:59)
> at
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:750)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:718)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:712)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1488)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1470)
> at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
> at com.sun.proxy.$Proxy30.get_database(Unknown Source)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:15014)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:14998)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
> at
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Locked ownable synchronizers:
> - <0x00007fae9f0d8c20> (a
> java.util.concurrent.ThreadPoolExecutor$Worker)
> ======================
> Looking at linux process, Hive exhausts its 'max processes count' while the
> issue is happening
> set to:
> Max processes 16000 16000 processes
> As a workaround, we restart Metastores and it works fine for few days.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)