[ 
https://issues.apache.org/jira/browse/HIVE-16839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673816#comment-16673816
 ] 

ASF GitHub Bot commented on HIVE-16839:
---------------------------------------

GitHub user guangyy opened a pull request:

    https://github.com/apache/hive/pull/484

    HIVE-16839: Fix a race condidtion during concurrent partition drops

    We have seen a leaked lock on hive metastore DB which caused all
    PARTITION insertion failed on timeout waiting for lock until the
    metastore service is restarted.
    
    A transaction dump on the DB shows there is a thread that is Sleep which
    potentiall holds the the lock, like:
    ```
      trx_id: 33603171058
                     trx_state: RUNNING
                   trx_started: 2018-10-23 06:43:22
         trx_requested_lock_id: NULL
              trx_wait_started: NULL
                    trx_weight: 70298
           trx_mysql_thread_id: 275402202
                     trx_query: NULL
           trx_operation_state: NULL
             trx_tables_in_use: 0
             trx_tables_locked: 0
              trx_lock_structs: 21286
         trx_lock_memory_bytes: 2881064
               trx_rows_locked: 98810
             trx_rows_modified: 49012
       trx_concurrency_tickets: 0
           trx_isolation_level: READ COMMITTED
             trx_unique_checks: 1
        trx_foreign_key_checks: 1
    trx_last_foreign_key_error: NULL
     trx_adaptive_hash_latched: 0
     trx_adaptive_hash_timeout: 0
              trx_is_read_only: 0
    trx_autocommit_non_locking: 0
                            ID: 275402202
                          USER: metastore_gold
                          HOST: 10.37.182.82:36684
                            DB: metastoregold
                       COMMAND: Sleep
                          TIME: 1
                         STATE:
                          INFO: NULL
                      duration: 1316
    Given the HOST ip, we trace back to the hive metastore instance and found 
the following exceptions:
    
    No such database row
    org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
row
            at 
org.datanucleus.store.rdbms.request.FetchRequest.execute(FetchRequest.java:357)
            at 
org.datanucleus.store.rdbms.RDBMSPersistenceHandler.fetchObject(RDBMSPersistenceHandler.java:324)
            at 
org.datanucleus.state.AbstractStateManager.loadFieldsFromDatastore(AbstractStateManager.java:1120)
            at 
org.datanucleus.state.JDOStateManager.loadSpecifiedFields(JDOStateManager.java:2916)
            at 
org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3219)
    ```
    The problem is that the caller expects a NULL if the partition does not 
exist, however, the convertToPart function would throw
    an exception which lead to the leak.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/guangyy/hive HIVE-16839

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hive/pull/484.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #484
    
----
commit 5137027ee658990dd1503c09c13a73e2848d8deb
Author: Guang Yang <guang.yang@...>
Date:   2018-11-02T23:21:35Z

    HIVE-16839: Fix a race condidtion during concurrent partition drops

----


> Unbalanced calls to openTransaction/commitTransaction when alter the same 
> partition concurrently
> ------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16839
>                 URL: https://issues.apache.org/jira/browse/HIVE-16839
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Nemon Lou
>            Assignee: Vihang Karajgaonkar
>            Priority: Major
>              Labels: pull-request-available
>
> SQL to reproduce:
> prepare:
> {noformat}
>  hdfs dfs -mkdir -p 
> /hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627
>  1,create external table tb_ltgsm_external (id int) PARTITIONED by (cp 
> string,ld string);
> {noformat}
> open one beeline run these two sql many times 
> {noformat} 2,ALTER TABLE tb_ltgsm_external ADD IF NOT EXISTS PARTITION 
> (cp=2017060513,ld=2017060610);
>  3,ALTER TABLE tb_ltgsm_external PARTITION (cp=2017060513,ld=2017060610) SET 
> LOCATION 
> 'hdfs://hacluster/hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627';
> {noformat}
> open another beeline to run this sql many times at the same time.
> {noformat}
>  4,ALTER TABLE tb_ltgsm_external DROP PARTITION (cp=2017060513,ld=2017060610);
> {noformat}
> MetaStore logs:
> {noformat}
> 2017-06-06 21:58:34,213 | ERROR | pool-6-thread-197 | Retrying HMSHandler 
> after 2000 ms (attempt 1 of 10) with error: 
> javax.jdo.JDOObjectNotFoundException: No such database row
> FailedObject:49[OID]org.apache.hadoop.hive.metastore.model.MStorageDescriptor
>       at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:475)
>       at 
> org.datanucleus.api.jdo.JDOAdapter.getApiExceptionForNucleusException(JDOAdapter.java:1158)
>       at 
> org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3231)
>       at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetcd(MStorageDescriptor.java)
>       at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getCD(MStorageDescriptor.java:184)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1282)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1299)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:1680)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:1586)
>       at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
>       at com.sun.proxy.$Proxy0.getPartition(Unknown Source)
>       at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:538)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
>       at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
>       at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
>       at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
>       at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
>       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>       at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>       at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
>       at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>       at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> NestedThrowablesStackTrace:
> No such database row
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row
>       at 
> org.datanucleus.store.rdbms.request.FetchRequest.execute(FetchRequest.java:357)
>       at 
> org.datanucleus.store.rdbms.RDBMSPersistenceHandler.fetchObject(RDBMSPersistenceHandler.java:324)
>       at 
> org.datanucleus.state.AbstractStateManager.loadFieldsFromDatastore(AbstractStateManager.java:1120)
>       at 
> org.datanucleus.state.JDOStateManager.loadSpecifiedFields(JDOStateManager.java:2916)
>       at 
> org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3219)
>       at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetcd(MStorageDescriptor.java)
>       at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getCD(MStorageDescriptor.java:184)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1282)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1299)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:1680)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:1586)
>       at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
>       at com.sun.proxy.$Proxy0.getPartition(Unknown Source)
>       at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:538)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
>       at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
>       at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
>       at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
>       at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
>       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>       at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>       at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
>       at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>       at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
>  | 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:155)
> 2017-06-06 21:58:36,232 | ERROR | pool-6-thread-197 | 
> InvalidOperationException(message:alter is not possible)
>       at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:563)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
>       at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
>       at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
>       at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
>       at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
>       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>       at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>       at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
>       at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>       at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
>  | 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:141)
> 2017-06-06 21:58:36,251 | INFO  | pool-6-thread-197 | 39: Shutting down the 
> object store... | 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:794)
> 2017-06-06 21:58:36,252 | ERROR | pool-6-thread-197 | Retrying HMSHandler 
> after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDOUserException: 
> Transaction is still active. You should always close your transactions 
> correctly using commit() or rollback().
> FailedObject:org.datanucleus.api.jdo.JDOPersistenceManager@215e7d59
>       at 
> org.datanucleus.api.jdo.JDOPersistenceManager.internalClose(JDOPersistenceManager.java:177)
>       at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.releasePersistenceManager(JDOPersistenceManagerFactory.java:1221)
>       at 
> org.datanucleus.api.jdo.JDOPersistenceManager.close(JDOPersistenceManager.java:366)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.shutdown(ObjectStore.java:427)
>       at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
>       at com.sun.proxy.$Proxy0.shutdown(Unknown Source)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.shutdown(HiveMetaStore.java:866)
>       at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
>       at com.sun.proxy.$Proxy12.shutdown(Unknown Source)
>       at 
> com.facebook.fb303.FacebookService$Processor$shutdown.getResult(FacebookService.java:1131)
>       at 
> com.facebook.fb303.FacebookService$Processor$shutdown.getResult(FacebookService.java:1117)
>       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>       at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>       at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
>       at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>       at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
>  | 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:155)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to