+1
On Fri, Aug 8, 2014 at 4:58 PM, Cameron McKenzie <[email protected]> wrote: > The proposed alternative is pretty much the same thing, but moving the > logic to the LockInternals class. The added complexity of this is that the > LockInternals class doesn't know what the name of the zNode that was going > to be created is called. We'd need to expose the generated UUID on the > CreateBuilder. > > My preference is to fix this in the protected() code. We can either just > handle the InterruptedException, or handle any non KeeperException (other > than ConnectionLoss). The InterruptedException is what's happening in this > case, but if we get an NPE or something in that code, the same thing would > occur, so I think that perhaps we should just have a blanket catch all. > > I have coded this up already and all of the unit tests pass ok. > cheers > > > On Fri, Aug 8, 2014 at 11:05 PM, Mike Drob <[email protected]> wrote: > > > Explicitly coding for a possible InterruptedException sounds good to me, > > but we already know that I prefer (more) diverse Exception types. > > > > I'm not sure I understand what the proposed alternative is? > > > > Mike > > > > > > On Fri, Aug 8, 2014 at 1:02 AM, Cameron McKenzie <[email protected] > > > > wrote: > > > > > Guys, > > > I've been looking into a fix for CURATOR-79 ( > > > https://issues.apache.org/jira/browse/CURATOR-79) and have found it to > > be > > > slightly more complicated than initially expected. > > > > > > The locking recipes are using protected zNodes (i.e the zNode name > > contains > > > a random UUID that is tied to a particular builder instance) for locks, > > > which is sensible, but there seems to be an issue with this. > > > > > > The protected logic basically looks for the cause of failure on a > create, > > > and if it's connection loss, then it does an ensured deleted on the > path > > it > > > was trying to create to ensure that it's removed if it did get created. > > > > > > For CURATOR-79, and InterruptedException is causing this call to fail > > when > > > waiting for the response from ZK. This means that the protected logic > > does > > > not fire and we end up with an orphaned node. > > > > > > It's possible with some ugliness to handle this in the > InterprocesMutex, > > > but I think that maybe it's better fixed in the protected logic. Maybe > > the > > > protected logic could be modified so that it will occur on > ConnectionLoss > > > or on any non-KeeperException (i.e. InterruptedException). This would > > cause > > > the zNode to be removed if it was created, and would fix this deadlock > > > issue. > > > > > > I would welcome anyone's opinion on the way forward. > > > cheers > > > Cam > > > > > >
