[jira] [Updated] (HIVE-12258) read/write into same partitioned table + concurrency = deadlock

Furcy Pin (JIRA) Sat, 24 Oct 2015 03:10:30 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-12258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Furcy Pin updated HIVE-12258:
-----------------------------
    Description: 
When hive.support.concurrency is enabled if you launch a query that reads data 
from a partition and writes data into another partition of the same table,
it creates a deadlock. 
The worse part is that once the deadlock is active, you can't query the table 
until it times out. 

* How to reproduce :


CREATE TABLE test_table (id INT) 
PARTITIONED BY (part STRING)
;

INSERT INTO TABLE test_table PARTITION (part="test")
VALUES (1), (2), (3), (4) 
;

INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
SELECT id FROM test_table WHERE part="test1";

Nothing happens, and when doing a SHOW LOCKS in another terminal we get :

| lockid   | database  | table      | partition  | lock_state  | lock_type    | 
transaction_id  | last_heartbeat  |  acquired_at   |
| 3765     | default   | test_table | NULL       | WAITING     | SHARED_READ  | 
NULL            | 1440603633148   | NULL           |
| 3765     | default   | test_table | part=test2 | WAITING     | EXCLUSIVE    | 
NULL            | 1440603633148   | NULL           |

This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents 
in 1.2.1.
I could not reproduce it easily locally because it requires a 
pseudo-distributed setup with zookeeper to have concurrency enabled.

>From looking at the code I believe the problem comes from the 
>EmbeddedLockManager method 
public List<HiveLock> lock(List<HiveLockObj> objs, int numRetriesForLock, long 
sleepTime)
that keeps trying to acquire two incompatible locks, and ends up failing after 
hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s 
= 100 minutes.


  was:
When hive.support.concurrency is enabled if you launch a query that reads data 
from a partition and writes data into another partition of the same table,
it creates a deadlock. 
The worse part is that once the deadlock is active, you can't query the table 
until it times out. 

* How to reproduce :


CREATE TABLE test_table (id INT) 
PARTITIONED BY (part STRING)
;

INSERT INTO TABLE test_table PARTITION (part="test")
VALUES (1), (2), (3), (4) 
;

INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
SELECT id FROM test_table WHERE part="test1";

Nothing happens, and when doing a SHOW LOCKS in another terminal we get :

| lockid   | database  | table      | partition  | lock_state  | lock_type    | 
transaction_id  | last_heartbeat  |  acquired_at   |
| 3765     | default   | test_table | NULL       | WAITING     | SHARED_READ  | 
NULL            | 1440603633148   | NULL           |
| 3765     | default   | test_table | part=test2 | WAITING     | EXCLUSIVE    | 
NULL            | 1440603633148   | NULL           |

This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents 
in 1.2.1.
I could not reproduce it easily locally because it requires a 
pseudo-distributed setup with zookeeper to have concurrency enabled.

>From looking at the code I believe the problem comes from the 
>EmbeddedLockManager method 
`public List<HiveLock> lock(List<HiveLockObj> objs, int numRetriesForLock, long 
sleepTime)`
that keeps trying to acquire two incompatible locks, and ends up failing after 
hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s 
= 100 minutes.



> read/write into same partitioned table + concurrency = deadlock
> ---------------------------------------------------------------
>
>                 Key: HIVE-12258
>                 URL: https://issues.apache.org/jira/browse/HIVE-12258
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Furcy Pin
>
> When hive.support.concurrency is enabled if you launch a query that reads 
> data from a partition and writes data into another partition of the same 
> table,
> it creates a deadlock. 
> The worse part is that once the deadlock is active, you can't query the table 
> until it times out. 
> * How to reproduce :
> CREATE TABLE test_table (id INT) 
> PARTITIONED BY (part STRING)
> ;
> INSERT INTO TABLE test_table PARTITION (part="test")
> VALUES (1), (2), (3), (4) 
> ;
> INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
> SELECT id FROM test_table WHERE part="test1";
> Nothing happens, and when doing a SHOW LOCKS in another terminal we get :
> | lockid   | database  | table      | partition  | lock_state  | lock_type    
> | transaction_id  | last_heartbeat  |  acquired_at   |
> | 3765     | default   | test_table | NULL       | WAITING     | SHARED_READ  
> | NULL            | 1440603633148   | NULL           |
> | 3765     | default   | test_table | part=test2 | WAITING     | EXCLUSIVE    
> | NULL            | 1440603633148   | NULL           |
> This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still 
> presents in 1.2.1.
> I could not reproduce it easily locally because it requires a 
> pseudo-distributed setup with zookeeper to have concurrency enabled.
> From looking at the code I believe the problem comes from the 
> EmbeddedLockManager method 
> public List<HiveLock> lock(List<HiveLockObj> objs, int numRetriesForLock, 
> long sleepTime)
> that keeps trying to acquire two incompatible locks, and ends up failing 
> after 
> hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 
> 100*60s = 100 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12258) read/write into same partitioned table + concurrency = deadlock

Reply via email to