[ https://issues.apache.org/jira/browse/CURATOR-535?focusedWorklogId=776067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776067 ]
ASF GitHub Bot logged work on CURATOR-535: ------------------------------------------ Author: ASF GitHub Bot Created on: 31/May/22 01:37 Start Date: 31/May/22 01:37 Worklog Time Spent: 10m Work Description: paul8263 commented on PR #406: URL: https://github.com/apache/curator/pull/406#issuecomment-1141583133 Hi @tisonkun @eolivelli and @Randgalt , Thank you for your reply. This problem is unusual. I got this problem when running unit tests in other project which relies on Zookeeper. The unit tests are running parallelly so that TestServer creating process might get a race condition when allocating unused ports. Currently my solution is the steps below: 1. Get a random unused port. 2. Implementing a file lock. 3. Allocating the port to TestServer. 4. Check if TestServer starts properly. If it starts successfully, release the file lock. I would like to move those steps inside TestServer creation and start process. However I worry that introducing a file lock might not be the best solution as it only solves an unusual problem at the cost of performance degradation. Have you got any better ideas? I think using the file lock should be considered as the last resort. Correct me if I am wrong. Issue Time Tracking ------------------- Worklog Id: (was: 776067) Time Spent: 50m (was: 40m) > TestServer random port selection has a race condition > ----------------------------------------------------- > > Key: CURATOR-535 > URL: https://issues.apache.org/jira/browse/CURATOR-535 > Project: Apache Curator > Issue Type: Bug > Affects Versions: 4.2.0 > Environment: Operating System: > Fedora 30 (amd64) > JVM: > openjdk version "1.8.0_212" > OpenJDK Runtime Environment (build 1.8.0_212-b04) > OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode) > Reporter: Laverne Schrock > Priority: Minor > Attachments: BugReproducer.java, log4j.properties > > Time Spent: 50m > Remaining Estimate: 0h > > When using one of the constructors for org.apache.curator.test.TestingServer > that doesn't take a port number, the org.apache.curator.test.InstanceSpec > that is constructed will chose random available ports to use. However, > InstanceSpec only binds those ports during construction and then unbinds them > so that they can be used when TestingServer.start() is called. > This disconnect between port selection creates a race condition where some > other process (or thread) could bind the port before TestingServer is started. > I've seen this very rarely in our integration test suite that spins up and > tears down TestingServer many times. I've attached a simple class for > reproducing the issue. If you run it in an environment with log4j loaded and > the attached log4j.properties, you should see output like the following > (though it sometimes takes more iterations): > {{completed iteration: 0}} > {{completed iteration: 500}} > {{2019-08-02 09:47:06 ERROR TestingZooKeeperServer:162 - From testing server > (random state: false) for instance: > InstanceSpec\{dataDirectory=/tmp/1564753624792-1, port=34707, > electionPort=33621, quorumPort=45995, deleteDataDirectoryOnClose=true, > serverId=1286, tickTime=-1, maxClientCnxns=-1, customProperties={}, > hostname=127.0.0.1} org.apache.curator.test.InstanceSpec@59c43d10}} > {{java.net.BindException: Address already in use}} > {{ at sun.nio.ch.Net.bind0(Native Method)}} > {{ at sun.nio.ch.Net.bind(Net.java:433)}} > {{ at sun.nio.ch.Net.bind(Net.java:425)}} > {{ at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)}} > {{ at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)}} > {{ at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)}} > {{ at > org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:687)}} > {{ at > org.apache.zookeeper.server.ServerCnxnFactory.configure(ServerCnxnFactory.java:76)}} > {{ at > org.apache.curator.test.TestingZooKeeperMain.internalRunFromConfig(TestingZooKeeperMain.java:239)}} > {{ at > org.apache.curator.test.TestingZooKeeperMain.runFromConfig(TestingZooKeeperMain.java:132)}} > {{ at > org.apache.curator.test.TestingZooKeeperServer$1.run(TestingZooKeeperServer.java:158)}} > {{ at java.lang.Thread.run(Thread.java:748)}} > {{java.lang.IllegalStateException: Timed out waiting for watch removal}} > {{ at > org.apache.curator.test.TestingZooKeeperMain.blockUntilStarted(TestingZooKeeperMain.java:146)}} > {{ at > org.apache.curator.test.TestingZooKeeperServer.start(TestingZooKeeperServer.java:167)}} > {{ at > org.apache.curator.test.TestingServer.start(TestingServer.java:148)}} > {{ at BugReproducer.main(BugReproducer.java:15)}} -- This message was sent by Atlassian Jira (v8.20.7#820007)