keith-turner commented on issue #1086:
URL: https://github.com/apache/accumulo/issues/1086#issuecomment-670084908
@milleruntime I researched this a bit more, looking into the following two
concerns.
1. ZooReaderWriter retries on session expired exceptions.
2. ZooReaderWriter automatically gets a new zookeeper when one is closed.
For issue 1, looking at the code zoolock code it deals mainly deals with
session expired in watcher. As far as I can tell ZooReaderWriter does not
alter events from watcher at the moment. For other code in ZooLock, its
difficult to analyze the impact of ZooReaderWriter retrying on session expired.
There were a few places I was concerned about that seemed like they may work
by accident. So for concern 1 I have yet to identify an actual bug through
code analysis, but I suspect there could be potential bugs.
For concern 2, I did find an actual race condition that seems like it could
result in erroneous being placed in the metadata table.
In the code below, ZooLock calls
ZooReaderWriter.getZooKeeper().getSessionId() to compute its LockId
https://github.com/apache/accumulo/blob/rel/1.9.3/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooLock.java#L358-L363
Below is the code for ZooReaderWriter.getZooKeeper(). If you follow this
code you will see that it creates a new Zookeeper if the current one is closed.
For the above code that is problematic, because a new ZK object would have a
different session id.
https://github.com/apache/accumulo/blob/rel/1.9.3/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooReaderWriter.java#L48-L54
The code below is called by the tablet server and based on the previous code
may put the wrong id in the metadata table.
https://github.com/apache/accumulo/blob/rel/1.9.3/server/base/src/main/java/org/apache/accumulo/server/util/MetadataTableUtil.java#L135-L138
For this particular case a fix may be to make ZooLock.getLockID call
ZooLock.getSessionId() instead of
ZooLock.zooKeeper.getZooKeeper().getSessionId(). But I am not completely sure.
Anyway this is one example of how the zookeeper silently switching under
zoolock could be problematic, there may be other cases.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]