[
https://issues.apache.org/jira/browse/HIVE-16839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673816#comment-16673816
]
ASF GitHub Bot commented on HIVE-16839:
---------------------------------------
GitHub user guangyy opened a pull request:
https://github.com/apache/hive/pull/484
HIVE-16839: Fix a race condidtion during concurrent partition drops
We have seen a leaked lock on hive metastore DB which caused all
PARTITION insertion failed on timeout waiting for lock until the
metastore service is restarted.
A transaction dump on the DB shows there is a thread that is Sleep which
potentiall holds the the lock, like:
```
trx_id: 33603171058
trx_state: RUNNING
trx_started: 2018-10-23 06:43:22
trx_requested_lock_id: NULL
trx_wait_started: NULL
trx_weight: 70298
trx_mysql_thread_id: 275402202
trx_query: NULL
trx_operation_state: NULL
trx_tables_in_use: 0
trx_tables_locked: 0
trx_lock_structs: 21286
trx_lock_memory_bytes: 2881064
trx_rows_locked: 98810
trx_rows_modified: 49012
trx_concurrency_tickets: 0
trx_isolation_level: READ COMMITTED
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 0
trx_is_read_only: 0
trx_autocommit_non_locking: 0
ID: 275402202
USER: metastore_gold
HOST: 10.37.182.82:36684
DB: metastoregold
COMMAND: Sleep
TIME: 1
STATE:
INFO: NULL
duration: 1316
Given the HOST ip, we trace back to the hive metastore instance and found
the following exceptions:
No such database row
org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database
row
at
org.datanucleus.store.rdbms.request.FetchRequest.execute(FetchRequest.java:357)
at
org.datanucleus.store.rdbms.RDBMSPersistenceHandler.fetchObject(RDBMSPersistenceHandler.java:324)
at
org.datanucleus.state.AbstractStateManager.loadFieldsFromDatastore(AbstractStateManager.java:1120)
at
org.datanucleus.state.JDOStateManager.loadSpecifiedFields(JDOStateManager.java:2916)
at
org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3219)
```
The problem is that the caller expects a NULL if the partition does not
exist, however, the convertToPart function would throw
an exception which lead to the leak.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/guangyy/hive HIVE-16839
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/hive/pull/484.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #484
----
commit 5137027ee658990dd1503c09c13a73e2848d8deb
Author: Guang Yang <guang.yang@...>
Date: 2018-11-02T23:21:35Z
HIVE-16839: Fix a race condidtion during concurrent partition drops
----
> Unbalanced calls to openTransaction/commitTransaction when alter the same
> partition concurrently
> ------------------------------------------------------------------------------------------------
>
> Key: HIVE-16839
> URL: https://issues.apache.org/jira/browse/HIVE-16839
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.1.0
> Reporter: Nemon Lou
> Assignee: Vihang Karajgaonkar
> Priority: Major
> Labels: pull-request-available
>
> SQL to reproduce:
> prepare:
> {noformat}
> hdfs dfs -mkdir -p
> /hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627
> 1,create external table tb_ltgsm_external (id int) PARTITIONED by (cp
> string,ld string);
> {noformat}
> open one beeline run these two sql many times
> {noformat} 2,ALTER TABLE tb_ltgsm_external ADD IF NOT EXISTS PARTITION
> (cp=2017060513,ld=2017060610);
> 3,ALTER TABLE tb_ltgsm_external PARTITION (cp=2017060513,ld=2017060610) SET
> LOCATION
> 'hdfs://hacluster/hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627';
> {noformat}
> open another beeline to run this sql many times at the same time.
> {noformat}
> 4,ALTER TABLE tb_ltgsm_external DROP PARTITION (cp=2017060513,ld=2017060610);
> {noformat}
> MetaStore logs:
> {noformat}
> 2017-06-06 21:58:34,213 | ERROR | pool-6-thread-197 | Retrying HMSHandler
> after 2000 ms (attempt 1 of 10) with error:
> javax.jdo.JDOObjectNotFoundException: No such database row
> FailedObject:49[OID]org.apache.hadoop.hive.metastore.model.MStorageDescriptor
> at
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:475)
> at
> org.datanucleus.api.jdo.JDOAdapter.getApiExceptionForNucleusException(JDOAdapter.java:1158)
> at
> org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3231)
> at
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetcd(MStorageDescriptor.java)
> at
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getCD(MStorageDescriptor.java:184)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1282)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1299)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:1680)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:1586)
> at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
> at com.sun.proxy.$Proxy0.getPartition(Unknown Source)
> at
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:538)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
> at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
> at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> NestedThrowablesStackTrace:
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database
> row
> at
> org.datanucleus.store.rdbms.request.FetchRequest.execute(FetchRequest.java:357)
> at
> org.datanucleus.store.rdbms.RDBMSPersistenceHandler.fetchObject(RDBMSPersistenceHandler.java:324)
> at
> org.datanucleus.state.AbstractStateManager.loadFieldsFromDatastore(AbstractStateManager.java:1120)
> at
> org.datanucleus.state.JDOStateManager.loadSpecifiedFields(JDOStateManager.java:2916)
> at
> org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3219)
> at
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetcd(MStorageDescriptor.java)
> at
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getCD(MStorageDescriptor.java:184)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1282)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1299)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:1680)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:1586)
> at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
> at com.sun.proxy.$Proxy0.getPartition(Unknown Source)
> at
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:538)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
> at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
> at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> |
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:155)
> 2017-06-06 21:58:36,232 | ERROR | pool-6-thread-197 |
> InvalidOperationException(message:alter is not possible)
> at
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:563)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
> at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
> at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> |
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:141)
> 2017-06-06 21:58:36,251 | INFO | pool-6-thread-197 | 39: Shutting down the
> object store... |
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:794)
> 2017-06-06 21:58:36,252 | ERROR | pool-6-thread-197 | Retrying HMSHandler
> after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDOUserException:
> Transaction is still active. You should always close your transactions
> correctly using commit() or rollback().
> FailedObject:org.datanucleus.api.jdo.JDOPersistenceManager@215e7d59
> at
> org.datanucleus.api.jdo.JDOPersistenceManager.internalClose(JDOPersistenceManager.java:177)
> at
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.releasePersistenceManager(JDOPersistenceManagerFactory.java:1221)
> at
> org.datanucleus.api.jdo.JDOPersistenceManager.close(JDOPersistenceManager.java:366)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.shutdown(ObjectStore.java:427)
> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
> at com.sun.proxy.$Proxy0.shutdown(Unknown Source)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.shutdown(HiveMetaStore.java:866)
> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
> at com.sun.proxy.$Proxy12.shutdown(Unknown Source)
> at
> com.facebook.fb303.FacebookService$Processor$shutdown.getResult(FacebookService.java:1131)
> at
> com.facebook.fb303.FacebookService$Processor$shutdown.getResult(FacebookService.java:1117)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
> at
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> |
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:155)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)