keith-turner commented on a change in pull request #2329:
URL: https://github.com/apache/accumulo/pull/2329#discussion_r740272026



##########
File path: 
core/src/main/java/org/apache/accumulo/fate/zookeeper/DistributedReadWriteLock.java
##########
@@ -218,22 +237,54 @@ public boolean tryLock() {
       }
       SortedMap<Long,byte[]> entries = qlock.getEarlierEntries(entry);
       Iterator<Entry<Long,byte[]>> iterator = entries.entrySet().iterator();
-      if (!iterator.hasNext())
+      if (!iterator.hasNext()) {
         throw new IllegalStateException("Did not find our own lock in the 
queue: " + this.entry
             + " userData " + new String(this.userData, UTF_8) + " lockType " + 
lockType());
-      return iterator.next().getKey().equals(entry);
+      }
+      if (!failBlockers) {
+        return iterator.next().getKey().equals(entry);
+      } else {
+        ZooStore<DistributedReadWriteLock> zs;
+        try {
+          zs = new ZooStore<>(zooPath, zrw);
+        } catch (KeeperException | InterruptedException e1) {
+          log.error("Error creating zoo store", e1);
+          return false;
+        }
+        final AdminUtil<DistributedReadWriteLock> util = new AdminUtil<>();
+        boolean result = true;
+        while (iterator.hasNext()) {
+          Entry<Long,byte[]> e = iterator.next();
+          if (!e.getKey().equals(entry)) {
+            result &= util.prepFail(zs, zrw, zooManagerPath, 
Long.toString(e.getKey(), 16));

Review comment:
       I think as long as the manager and this code are using the same zoostore 
object, then the call to reserve will address the 2nd problem I mentioned 
above.  I see a third problem, prepFail transitions FATE ops to 
FAILED_IN_PROGRESS.  This means those fate ops will unwind and execute their 
undo() operations which could modify the metadata table and zookeeper. So this 
could lead to other FATE ops that no longer hold the lock still modifying 
persisted state related to the table.  Would probably need to wait for these 
FATE ops to transition from FAILED_IN_PROGRESS to FAILED before getting the 
lock.   Have to be careful about how this wait is done.  If all threads in the 
fate thread pool are waiting for other operations to transition from  
FAILED_IN_PROGRESS to FAILED, then that would not leave any threads available 
to transitions those operations leading to deadlock.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to