[
https://issues.apache.org/jira/browse/HIVE-12258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Furcy Pin updated HIVE-12258:
-----------------------------
Description:
When hive.support.concurrency is enabled if you launch a query that reads data
from a partition and writes data into another partition of the same table,
it creates a deadlock.
The worse part is that once the deadlock is active, you can't query the table
until it times out.
* How to reproduce :
```
CREATE TABLE test_table (id INT)
PARTITIONED BY (part STRING)
;
INSERT INTO TABLE test_table PARTITION (part="test")
VALUES (1), (2), (3), (4)
;
INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
SELECT id FROM test_table WHERE part="test1";
```
Nothing happens, and when doing a SHOW LOCKS in another terminal we get :
```
SHOW LOCKS ;
+----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+
| lockid | database | table | partition | lock_state | lock_type |
transaction_id | last_heartbeat | acquired_at |
+----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+
| 3765 | default | test_table | NULL | WAITING | SHARED_READ |
NULL | 1440603633148 | NULL |
| 3765 | default | test_table | part=test2 | WAITING | EXCLUSIVE |
NULL | 1440603633148 | NULL |
+----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+
```
This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents
in 1.2.1.
I could not reproduce it easily locally because it requires a
pseudo-distributed setup with zookeeper to have concurrency enabled.
>From looking at the code I believe the problem comes from the
>EmbeddedLockManager method
`public List<HiveLock> lock(List<HiveLockObj> objs, int numRetriesForLock, long
sleepTime)`
that keeps trying to acquire two incompatible locks, and ends up failing after
hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s
= 100 minutes.
was:
When hive.support.concurrency is enabled if you launch a query that reads data
from a partition and writes data into another partition of the same table,
it creates a deadlock.
The worse part is that once the deadlock is active, you can't query the table
until it times out.
## How to reproduce :
```sql
CREATE TABLE test_table (id INT)
PARTITIONED BY (part STRING)
;
INSERT INTO TABLE test_table PARTITION (part="test")
VALUES (1), (2), (3), (4)
;
INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
SELECT id FROM test_table WHERE part="test1";
```
Nothing happens, and when doing a SHOW LOCKS in another terminal we get :
```
SHOW LOCKS ;
+----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+
| lockid | database | table | partition | lock_state | lock_type |
transaction_id | last_heartbeat | acquired_at |
+----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+
| 3765 | default | test_table | NULL | WAITING | SHARED_READ |
NULL | 1440603633148 | NULL |
| 3765 | default | test_table | part=test2 | WAITING | EXCLUSIVE |
NULL | 1440603633148 | NULL |
+----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+
```
This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents
in 1.2.1.
I could not reproduce it easily locally because it requires a
pseudo-distributed setup with zookeeper to have concurrency enabled.
>From looking at the code I believe the problem comes from the
>EmbeddedLockManager method
`public List<HiveLock> lock(List<HiveLockObj> objs, int numRetriesForLock, long
sleepTime)`
that keeps trying to acquire two incompatible locks, and ends up failing after
hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s
= 100 minutes.
> read/write into same partitioned table + concurrency = deadlock
> ---------------------------------------------------------------
>
> Key: HIVE-12258
> URL: https://issues.apache.org/jira/browse/HIVE-12258
> Project: Hive
> Issue Type: Bug
> Reporter: Furcy Pin
>
> When hive.support.concurrency is enabled if you launch a query that reads
> data from a partition and writes data into another partition of the same
> table,
> it creates a deadlock.
> The worse part is that once the deadlock is active, you can't query the table
> until it times out.
> * How to reproduce :
> ```
> CREATE TABLE test_table (id INT)
> PARTITIONED BY (part STRING)
> ;
> INSERT INTO TABLE test_table PARTITION (part="test")
> VALUES (1), (2), (3), (4)
> ;
> INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
> SELECT id FROM test_table WHERE part="test1";
> ```
> Nothing happens, and when doing a SHOW LOCKS in another terminal we get :
> ```
> SHOW LOCKS ;
> +----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+
> | lockid | database | table | partition | lock_state | lock_type
> | transaction_id | last_heartbeat | acquired_at |
> +----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+
> | 3765 | default | test_table | NULL | WAITING | SHARED_READ
> | NULL | 1440603633148 | NULL |
> | 3765 | default | test_table | part=test2 | WAITING | EXCLUSIVE
> | NULL | 1440603633148 | NULL |
> +----------+-----------+------------+------------+-------------+--------------+-----------------+-----------------+----------------+
> ```
> This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still
> presents in 1.2.1.
> I could not reproduce it easily locally because it requires a
> pseudo-distributed setup with zookeeper to have concurrency enabled.
> From looking at the code I believe the problem comes from the
> EmbeddedLockManager method
> `public List<HiveLock> lock(List<HiveLockObj> objs, int numRetriesForLock,
> long sleepTime)`
> that keeps trying to acquire two incompatible locks, and ends up failing
> after
> hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is
> 100*60s = 100 minutes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)