Young-Seok Kim has posted comments on this change. Change subject: Deadlock-free locking protocol is enabled Change-Id: Ie58ae2f519baa53599e99b51bd61ea5f8366dafd ......................................................................
Patch Set 1: (17 comments) I addressed comments. The upsert case will be revisited as Yingyi pointed out and I roughly explained how it should be modified in the place where Yingyi put his comment. For the high-level questions from Yingyi, I gave the answers below: For Q1: This is for correctness since a transactor is a thread as described in the design document. For Q2: Since it's very hard to make lock conflict situation happen deterministically, I have manually tested the following case in a debugging mode in order to see the case that a thread/transactor fails to acquire tryLock in PrimaryIndexModificationOperationCallback which subsequently cause flushing partial frames to the next operator followed by creating WAIT log and so on. S1 has 10 records whose rids are from 0 to 4. Thread1 deletes 3 records (whose rids are 0, 1, 4) from a partition among 2 partitions of a dataset S1. Thread2 upsert a record whose rid is 1. Sequence of event between the two threads: (The order of events was dictated by putting/removing breaking points between threads in debugging mode in order to make those events happen deterministically.) 1. Thread1 deleted a record r0 among three records in the delivered frame into a primary index. 2. Thread2 acquired a x-lock on r1 by calling LockThenSearchOperationCallback.before() 3. Thread1 tries to acquire tryLock on r1 by calling PrimaryindexModificaitonOperationCallback.before(), but tryLock() request fails and flushes partial frames. Then creates a WAIT log which makes thread1 waits until LogFlusher thread flushes the WAIT log to disk and gives notification. 4. LogFlusher thread flushes the WAIT log and gives a notification to Thread1. 5. thread1 calls lock() on r1 which makes thread1 waits until r1's x lock is acquired. 6. Thread2 finishes deleting and inserting r1 and generates a commit log 7. LogFlusher thread flushes the commit log generated by thread2. 8. thread1 grabs the lock on r1 and continues deleting records r1 and r2. https://asterix-gerrit.ics.uci.edu/#/c/825/1/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/base/RuleCollections.java File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/base/RuleCollections.java: Line 294: //physicalRewritesAllLevels.add(new IntroduceInstantLockSearchCallbackRule()); > Remove this line instead of commenting out. It's removed. https://asterix-gerrit.ics.uci.edu/#/c/825/1/asterixdb/asterix-common/src/main/java/org/apache/asterix/common/dataflow/AsterixLSMInsertDeleteOperatorNodePushable.java File asterixdb/asterix-common/src/main/java/org/apache/asterix/common/dataflow/AsterixLSMInsertDeleteOperatorNodePushable.java: Line 48: private int lastFlushedTupleIdx; > annotate the meaning of each: Done Line 93: flushedPartialTuples = false; > annotate why they need to be reset to 0/false for each frame. Explained in the above where the variables declared. https://asterix-gerrit.ics.uci.edu/#/c/825/1/asterixdb/asterix-common/src/main/java/org/apache/asterix/common/transactions/LogRecord.java File asterixdb/asterix-common/src/main/java/org/apache/asterix/common/transactions/LogRecord.java: Line 69: * 5) WAIT_LOG_SIZE: 14 bytes (Header1(6) + Tail(8)) > WS Done Line 70: * --> WAIT_LOG only requires LogType Field, but in order to conform the log reader protocol > WS Done https://asterix-gerrit.ics.uci.edu/#/c/825/1/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/operators/AsterixLSMPrimaryUpsertOperatorNodePushable.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/operators/AsterixLSMPrimaryUpsertOperatorNodePushable.java: Line 260: FrameUtils.appendToWriter(writer, appender, accessor, i); > This method of this class is never called. You're right. It's never called as is. PrimaryIndexModificationOperationCallback is not used for upsert. Instead, LockThenSearchOperationCallback() is used. This means that the LockThenSearchOperationCallback object needs to have the object of AsterixLSMPrimaryUpsertOperatorNodePushable when the contructor is called. I will reflect this to code and resubmit code review. https://asterix-gerrit.ics.uci.edu/#/c/825/1/asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/opcallbacks/PrimaryIndexModificationOperationCallback.java File asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/opcallbacks/PrimaryIndexModificationOperationCallback.java: Line 76: //release all locks held by this actor (which is a thread) by flushing partial frame. > WS Done Line 77: boolean retVal = lockManager.tryLock(jobThreadId, datasetId, pkHash, LockMode.X, txnCtx); > rename variable Done https://asterix-gerrit.ics.uci.edu/#/c/825/1/asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/locking/ConcurrentLockManager.java File asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/locking/ConcurrentLockManager.java: Line 51: public static final int NIL = -1; > Can you let the two variable names have more difference? Even though I literally moved Till's code change into my commit, the variable names made sense to me since NIL for integer -1 and NILL for long -1l. Based on this naming, all the places that you put (NIL?) are for long type, so NILL is correct one. Also, it will not matter even though they are used with a wrong one. So, I didn't think twice. If you still think they must be renamed, please recommend the names. I will use them. Line 561: jobSlot = new Long(jobArenaMgr.allocate()); > Will this lead to many object creations back-and-forth? We have been creating an object per job. Now, from this code change, that has been changed into the number of threads/transactors per job. The main goal of lock manager in terms of object creation is to avoid object creations as many as the number of entities/records. Line 952: try { > per-tuple-based objection creation? Yes, but this is for debugging/profiling method which is only enabled when CHECK_CONSISTENCY is set to true. So, it is not called for normal processing. https://asterix-gerrit.ics.uci.edu/#/c/825/1/asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/locking/LockManager.java File asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/locking/LockManager.java: Line 54: > Why should we still keep/maintain this class? it's removed. https://asterix-gerrit.ics.uci.edu/#/c/825/1/asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/logging/LogManagerWithReplication.java File asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/logging/LogManagerWithReplication.java: Line 44: //only locally generated logs should be replicated > add here && logRecord.getLogType() == LogType.WAIT to avoid replicating WAI Done Line 76: //wait for job Commit/Abort ACK from replicas > Add here if(Record.getLogType() == LogType.JOB_COMMIT || logRecord.getLogTy Done https://asterix-gerrit.ics.uci.edu/#/c/825/1/asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/transaction/TransactionSubsystem.java File asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/service/transaction/TransactionSubsystem.java: Line 57: public long profilerEntityCommitLogCount = 0; > Don't make this public, add a method to increment it Done Line 132: * > WS Done Line 133: * @author kisskys > Remove author Done -- To view, visit https://asterix-gerrit.ics.uci.edu/825 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie58ae2f519baa53599e99b51bd61ea5f8366dafd Gerrit-PatchSet: 1 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Young-Seok Kim <kiss...@gmail.com> Gerrit-Reviewer: Ian Maxon <ima...@apache.org> Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Gerrit-Reviewer: Murtadha Hubail <hubail...@gmail.com> Gerrit-Reviewer: Till Westmann <ti...@apache.org> Gerrit-Reviewer: Yingyi Bu <buyin...@gmail.com> Gerrit-Reviewer: Young-Seok Kim <kiss...@gmail.com> Gerrit-Reviewer: abdullah alamoudi <bamou...@gmail.com> Gerrit-HasComments: Yes