To circle back to the original test failure that prompted this discussion - the failing test was getting intermittent bind exceptions on subsequent server restarts.
I believe it's quite likely that a process' ports will remain unavailable even after it is gone (I'm not sure if we create listening sockets with SO_REUSEADDR). So, as to John's comment that gfsh is already synchronous, I don't think that adding extra functionality to gfsh, to ultimately just wait longer before exiting, is really solving the problem. I'd suggest you adjust the tests to always start servers with `--server-port=0` so that there are no port conflicts and let the OS handle it. --Jens On Wed, Sep 11, 2019 at 8:17 AM Bruce Schuchardt <bschucha...@pivotal.io> wrote: > Blocking or non-blocking, I don't have a strong opinion. What I'd > really like to have gfsh ensure, though, is that no-one is able to start > a new instance of a server while the old process is still around. Maybe > the PID file is the way to do that. > > On 9/10/19 3:08 PM, Mark Hanson wrote: > > Hello All, > > > > I would like to propose that we make the gfsh “stop server” command > synchronous. It is causing some issues with some tests as the rest of the > calls are blocking. Stop on the other hand immediately returns by > comparison. > > This causes issues as shown in GEODE-7017 specifically. > > > > GEODE:7017 CI failure: > org.apache.geode.launchers.ServerStartupValueRecoveryNotificationTest > > startupReportsOnlineOnlyAfterRedundancyRestored > > https://issues.apache.org/jira/browse/GEODE-7017 < > https://issues.apache.org/jira/browse/GEODE-7017> > > > > > > What do people think? > > > > Thanks, > > Mark >