GitHub user guangyy opened a pull request: https://github.com/apache/hive/pull/484
HIVE-16839: Fix a race condidtion during concurrent partition drops We have seen a leaked lock on hive metastore DB which caused all PARTITION insertion failed on timeout waiting for lock until the metastore service is restarted. A transaction dump on the DB shows there is a thread that is Sleep which potentiall holds the the lock, like: ``` trx_id: 33603171058 trx_state: RUNNING trx_started: 2018-10-23 06:43:22 trx_requested_lock_id: NULL trx_wait_started: NULL trx_weight: 70298 trx_mysql_thread_id: 275402202 trx_query: NULL trx_operation_state: NULL trx_tables_in_use: 0 trx_tables_locked: 0 trx_lock_structs: 21286 trx_lock_memory_bytes: 2881064 trx_rows_locked: 98810 trx_rows_modified: 49012 trx_concurrency_tickets: 0 trx_isolation_level: READ COMMITTED trx_unique_checks: 1 trx_foreign_key_checks: 1 trx_last_foreign_key_error: NULL trx_adaptive_hash_latched: 0 trx_adaptive_hash_timeout: 0 trx_is_read_only: 0 trx_autocommit_non_locking: 0 ID: 275402202 USER: metastore_gold HOST: 10.37.182.82:36684 DB: metastoregold COMMAND: Sleep TIME: 1 STATE: INFO: NULL duration: 1316 Given the HOST ip, we trace back to the hive metastore instance and found the following exceptions: No such database row org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row at org.datanucleus.store.rdbms.request.FetchRequest.execute(FetchRequest.java:357) at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.fetchObject(RDBMSPersistenceHandler.java:324) at org.datanucleus.state.AbstractStateManager.loadFieldsFromDatastore(AbstractStateManager.java:1120) at org.datanucleus.state.JDOStateManager.loadSpecifiedFields(JDOStateManager.java:2916) at org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3219) ``` The problem is that the caller expects a NULL if the partition does not exist, however, the convertToPart function would throw an exception which lead to the leak. You can merge this pull request into a Git repository by running: $ git pull https://github.com/guangyy/hive HIVE-16839 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/484.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #484 ---- commit 5137027ee658990dd1503c09c13a73e2848d8deb Author: Guang Yang <guang.yang@...> Date: 2018-11-02T23:21:35Z HIVE-16839: Fix a race condidtion during concurrent partition drops ---- ---