[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16321:
----------------------------------
    Description: 
TxnStore.MutexAPI is a mechanism how different Metastore instances can 
coordinate their operations.  It uses a JDBCConnection to achieve it.

In some cases this may lead to deadlock.  TxnHandler uses a connection pool of 
fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), where 
X is >= size of the pool.  This take all connections form the pool, so when
{noformat}
handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
{noformat} 
is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
pool is empty and the system is deadlocked.

MutexAPI can't use the same connection as the operation it's protecting.  
(TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).

We could make MutexAPI use a separate connection pool (size > 'primary' conn 
pool).

Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
enqueueing the lock with the expectation that the caller will always follow up 
with a call to checkLock(CheckLockRequest rqst).

cc [~f1sherox]



  was:
TxnStore.MutexAPI is a mechanism how different Metastore instances can 
coordinate their operations.  It uses a JDBCConnection to achieve it.

In some cases this may lead to deadlock.  TxnHandler uses a connection pool of 
fixed size.  Suppose you X simultaneous calls to  TxnHandler.lock(), where X is 
>= size of the pool.  This take all connections form the pool, so when
{noformat}
handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
{noformat} 
is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
pool is empty and the system is deadlocked.

MutexAPI can't use the same connection as the operation it's protecting.  
(TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).

We could make MutexAPI use a separate connection pool (size > 'primary' conn 
pool).

Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
enqueueing the lock with the expectation that the caller will always follow up 
with a call to checkLock(CheckLockRequest rqst).

cc [~f1sherox]




> Possible deadlock in metastore with Acid enabled
> ------------------------------------------------
>
>                 Key: HIVE-16321
>                 URL: https://issues.apache.org/jira/browse/HIVE-16321
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 1.3.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to