[ https://issues.apache.org/jira/browse/ZOOKEEPER-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andor Molnar resolved ZOOKEEPER-3162. ------------------------------------- Resolution: Fixed Fix Version/s: 3.5.5 3.4.14 3.6.0 Issue resolved by pull request 699 [https://github.com/apache/zookeeper/pull/699] > Broken lock semantics in C client lock-recipe > --------------------------------------------- > > Key: ZOOKEEPER-3162 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3162 > Project: ZooKeeper > Issue Type: Bug > Components: c client > Affects Versions: 3.0.0, 3.4.13 > Reporter: Andrea Reale > Priority: Major > Labels: pull-request-available > Fix For: 3.6.0, 3.4.14, 3.5.5 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > As reported (but never fixed) in the past by ZOOKEEPER-2409, ZOOKEEPER-2038 > and (partly) ZOOKEEPER-2878, the C client lock-recipe implementation is > broken. > I identified three issues. > The main one (as also reported in the aforementioned reports) is that the > logic that goes through the lock waiting list is broken. child_floor uses > strcmp and compares the full node name (i.e., sessionID-sequence) rather than > only comparing the sequence number. This makes it possible for two different > clients to hold the lock at the same time: assume two clients, one associated > with session A, the other with session B, with A < B lexicographically. Now > assume that at some point a thread in B holds a lock and a thread in A tries > to acquire the same lock. A will manage to get the lock because of the wrong > comparison function, so now two guys hold the lock. > The second issue is a possible deadlock inside zkr_lock_operation. > zkr_lock_operation is always called by holding the mutex associated to the > client lock. In some cases, zkr_lock_operaton may decide to give-up locking > and call zkr_lock_unlock to release the lock. When this happens, it will try > to acquire again the same phtread mutex, which will lead to a deadlock. > The third issue relates to the return value of zkr_lock_lock. According to > the API docs, the functions returns 0 when no errors. Then it is up to the > invoker to check when the lock is held by calling zkr_lock_isowner. However, > the implementation, in case of no error, returns zkr_lock_isowner. This is > wrong because it becomes impossible to distinguish an error condition from a > success (but not ownerhsip). Instead the API (as described in the docs, btw) > should return always 0 when no errors occur. > Shortly I will add the link to a PR fixing the issues. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)