[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956239#comment-14956239
 ] 

Young-Seok Kim commented on ASTERIXDB-1138:
-------------------------------------------

The proposed solution for this issue is as follows:
For the solution, we may play with tryLock() instead of lock() in order to 
avoid deadlock for any feed job (instead of detect and abort approach)

The solutions is as follows:

1. for each entry in a frame
2. returnValue = tryLock() for an entry
3. if returnValue == false
    3-1. flush all entries which acquired locks so far to the next operator
       : this will make all those entries reach commit operator so that 
corresponding commit logs will be created.
    3-2. create a WAIT_LOG and wait until logFlusher thread will flush the log 
and gives notification 
       : this notification guarantees that all locks acquired by this 
transactor were released.
    3-3. acquire lock using lock() instead of tryLock() for the failed entry
       : we know for sure this lock call will not cause deadlock since the 
transactor doesn't hold any other locks.
4. create an update log and insert the entry
5. when all entries in a frame are processed, the frame will be pushed to the 
next operator with ignoring already flushed entries. 

This will effectively make feed job avoid deadlock. 

> Abort request triggered by the deadlock from the Feed job is not handled.
> -------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1138
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1138
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Young-Seok Kim
>            Assignee: Young-Seok Kim
>            Priority: Critical
>
> I'm observing deadlock during my spatial index experiment.
> (The reason caused the deadlock is probably due to the PK hash value 
> collision between a reader(query) and a writer(feed job))
> When the lock manager declares deadlock and throws ACID exception with 
> requesting abort, the exception is swallowed by  FeedExceptionHandler without 
> any proper handling. 
> After this, all incoming inserted record keeps requesting abort by throwing 
> exception repeatedly, but again all swallowed by FeedExceptionHandler. 
> Due to this, queries hang by waiting for locks to be released, where the 
> locks are acquired during the record insertion which caused initial deadlock 
> situation. 
> The following shows the exception thrown:
> push Job 0
> push Resource 1970324836977414
> push Request 562949953421987
> push Job 2814749767106561
> pop Job 2814749767106561
> pop Request 562949953421987
> push Request 4222124650663985
> push Job 281474976710657
> push Resource 281474976711711
> push Request 281474976715806
> push Job 0
> Oct 13, 2015 8:25:10 AM 
> org.apache.asterix.transaction.management.service.locking.ConcurrentLockManager
>  requestAbort
> INFO: Exception: Transaction JID:21 should abort (requested by the Lock 
> Manager):
> Job 0:0:0
> Resource 7:0:b06
> Request f:0:1031
> Job 1:0:1
> Resource 1:0:41f
> Request 1:0:141e
> Job 0:0:0
> Exception: Transaction JID:21 should abort (requested by the Lock Manager):
> Job 0:0:0
> Resource 7:0:b06
> Request f:0:1031
> Job 1:0:1
> Resource 1:0:41f
> Request 1:0:141e
> Job 0:0:0
> Oct 13, 2015 8:25:10 AM 
> org.apache.asterix.transaction.management.service.locking.ConcurrentLockManager
>  requestAbort
> INFO: Exception: Transaction JID:21 should abort (requested by the Lock 
> Manager):
> timeout
> Exception: Transaction JID:21 should abort (requested by the Lock Manager):
> timeout
> org.apache.hyracks.api.exceptions.HyracksDataException: 
> org.apache.asterix.common.exceptions.ACIDException: Transaction JID:21 should 
> abort (requested by the Lock Manager):
> Job 0:0:0
> Resource 7:0:b06
> Request f:0:1031
> Job 1:0:1
> Resource 1:0:41f
> Request 1:0:141e
> Job 0:0:0
>         at 
> org.apache.asterix.transaction.management.opcallbacks.PrimaryIndexModificationOperationCallback.before(PrimaryIndexModificationOperationCallback.java:62)
>         at 
> org.apache.hyracks.storage.am.btree.impls.BTree.upsert(BTree.java:336)
>         at 
> org.apache.hyracks.storage.am.btree.impls.BTree.access$400(BTree.java:74)
>         at 
> org.apache.hyracks.storage.am.btree.impls.BTree$BTreeAccessor.upsertIfConditionElseInsert(BTree.java:938)
>         at 
> org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.insert(LSMBTree.java:441)
>         at 
> org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.modify(LSMBTree.java:379)
>         at 
> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:351)
>         at 
> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:334)
>         at 
> org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceInsert(LSMTreeIndexAccessor.java:157)
>         at 
> org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable.nextFrame(AsterixLSMInsertDeleteOperatorNodePushable.java:107)
>         at 
> org.apache.asterix.common.feeds.MonitoredBuffer.processMessage(MonitoredBuffer.java:322)
>         at 
> org.apache.asterix.common.feeds.MonitoredBuffer.processMessage(MonitoredBuffer.java:44)
>         at 
> org.apache.asterix.common.feeds.MessageReceiver$MessageReceiverRunnable.run(MessageReceiver.java:83)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.asterix.common.exceptions.ACIDException: Transaction 
> JID:21 should abort (requested by the Lock Manager):
> Job 0:0:0
> Resource 7:0:b06
> Request f:0:1031
> Job 1:0:1
> Resource 1:0:41f
> Request 1:0:141e
> Job 0:0:0
>         at 
> org.apache.asterix.transaction.management.service.locking.ConcurrentLockManager.requestAbort(ConcurrentLockManager.java:925)
>         at 
> org.apache.asterix.transaction.management.service.locking.ConcurrentLockManager.enqueueWaiter(ConcurrentLockManager.java:180)
>         at 
> org.apache.asterix.transaction.management.service.locking.ConcurrentLockManager.lock(ConcurrentLockManager.java:155)
>         at 
> org.apache.asterix.transaction.management.opcallbacks.PrimaryIndexModificationOperationCallback.before(PrimaryIndexModificationOperationCallback.java:53)
>         ... 15 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to