Young-Seok Kim created ASTERIXDB-1138:
-----------------------------------------
Summary: Abort request triggered by the deadlock from the Feed job
is not handled.
Key: ASTERIXDB-1138
URL: https://issues.apache.org/jira/browse/ASTERIXDB-1138
Project: Apache AsterixDB
Issue Type: Bug
Reporter: Young-Seok Kim
Assignee: Abdullah Alamoudi
Priority: Critical
I'm observing deadlock during my spatial index experiment.
(The reason caused the deadlock is probably due to the PK hash value collision
between a reader(query) and a writer(feed job))
When the lock manager declares deadlock and throws ACID exception with
requesting abort, the exception is swallowed by FeedExceptionHandler without
any proper handling.
After this, all incoming inserted record keeps requesting abort by throwing
exception repeatedly, but again all swallowed by FeedExceptionHandler.
Due to this, queries hang by waiting for locks to be released, where the locks
are acquired during the record insertion which caused initial deadlock
situation.
The following shows the exception thrown:
push Job 0
push Resource 1970324836977414
push Request 562949953421987
push Job 2814749767106561
pop Job 2814749767106561
pop Request 562949953421987
push Request 4222124650663985
push Job 281474976710657
push Resource 281474976711711
push Request 281474976715806
push Job 0
Oct 13, 2015 8:25:10 AM
org.apache.asterix.transaction.management.service.locking.ConcurrentLockManager
requestAbort
INFO: Exception: Transaction JID:21 should abort (requested by the Lock
Manager):
Job 0:0:0
Resource 7:0:b06
Request f:0:1031
Job 1:0:1
Resource 1:0:41f
Request 1:0:141e
Job 0:0:0
Exception: Transaction JID:21 should abort (requested by the Lock Manager):
Job 0:0:0
Resource 7:0:b06
Request f:0:1031
Job 1:0:1
Resource 1:0:41f
Request 1:0:141e
Job 0:0:0
Oct 13, 2015 8:25:10 AM
org.apache.asterix.transaction.management.service.locking.ConcurrentLockManager
requestAbort
INFO: Exception: Transaction JID:21 should abort (requested by the Lock
Manager):
timeout
Exception: Transaction JID:21 should abort (requested by the Lock Manager):
timeout
org.apache.hyracks.api.exceptions.HyracksDataException:
org.apache.asterix.common.exceptions.ACIDException: Transaction JID:21 should
abort (requested by the Lock Manager):
Job 0:0:0
Resource 7:0:b06
Request f:0:1031
Job 1:0:1
Resource 1:0:41f
Request 1:0:141e
Job 0:0:0
at
org.apache.asterix.transaction.management.opcallbacks.PrimaryIndexModificationOperationCallback.before(PrimaryIndexModificationOperationCallback.java:62)
at
org.apache.hyracks.storage.am.btree.impls.BTree.upsert(BTree.java:336)
at
org.apache.hyracks.storage.am.btree.impls.BTree.access$400(BTree.java:74)
at
org.apache.hyracks.storage.am.btree.impls.BTree$BTreeAccessor.upsertIfConditionElseInsert(BTree.java:938)
at
org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.insert(LSMBTree.java:441)
at
org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.modify(LSMBTree.java:379)
at
org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:351)
at
org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:334)
at
org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceInsert(LSMTreeIndexAccessor.java:157)
at
org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable.nextFrame(AsterixLSMInsertDeleteOperatorNodePushable.java:107)
at
org.apache.asterix.common.feeds.MonitoredBuffer.processMessage(MonitoredBuffer.java:322)
at
org.apache.asterix.common.feeds.MonitoredBuffer.processMessage(MonitoredBuffer.java:44)
at
org.apache.asterix.common.feeds.MessageReceiver$MessageReceiverRunnable.run(MessageReceiver.java:83)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.asterix.common.exceptions.ACIDException: Transaction
JID:21 should abort (requested by the Lock Manager):
Job 0:0:0
Resource 7:0:b06
Request f:0:1031
Job 1:0:1
Resource 1:0:41f
Request 1:0:141e
Job 0:0:0
at
org.apache.asterix.transaction.management.service.locking.ConcurrentLockManager.requestAbort(ConcurrentLockManager.java:925)
at
org.apache.asterix.transaction.management.service.locking.ConcurrentLockManager.enqueueWaiter(ConcurrentLockManager.java:180)
at
org.apache.asterix.transaction.management.service.locking.ConcurrentLockManager.lock(ConcurrentLockManager.java:155)
at
org.apache.asterix.transaction.management.opcallbacks.PrimaryIndexModificationOperationCallback.before(PrimaryIndexModificationOperationCallback.java:53)
... 15 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)