> The idea I am working with at the moment that Kirk pointed me at was to
use the pid file in the directory as indicator. Once that file disappears
the server is stopped.

How will this work if stop server --member is invoked some a different
machine than the member that is being stopped?

-Dan

On Wed, Sep 11, 2019 at 10:28 AM Mark Hanson <mhan...@pivotal.io> wrote:

> The idea I am working with at the moment that Kirk pointed me at was to
> use the pid file in the directory as indicator. Once that file disappears
> the server is stopped.
>
> That seems to work in my testing.
>
> Thoughts?
>
> Thanks,
> Mark
>
> > On Sep 11, 2019, at 10:23 AM, Dan Smith <dsm...@pivotal.io> wrote:
> >
> > It does seem like we should make stop synchronous, or at least make start
> > wait for the old process to die as Bruce suggested. Otherwise it is
> > difficult for someone to script the restart of a server.
> >
> > Looking at the code, it does look like gfsh stop is asynchronous. There
> are
> > multiple ways to stop a server:
> > * gfsh stop --dir - it looks like we write out some stop file and return
> > immediately. Or, if we can connect over JMX, we invoke the
> > MemberMBean.shutDownMember method, which launches a thread to close the
> > cache, which is also asynchronous.
> > * gfsh stop --pid - this seems to be similar to --dir
> > * With a member name - this appears to go to the
> MemberMBean.shutDownMember
> > method as well.
> >
> > I think one issue is that the JMX methods to stopping the server may be
> > hard to ensure the process is really gone, because they can be invoked
> > remotely. That may be why they are asynchronous - they need to return
> > something to the caller before shutting down. So maybe Bruce's suggestion
> > is better.
> >
> > As Jens pointed out - tests should generally just use port 0 for servers.
> >
> > -Dan
> >
> > On Wed, Sep 11, 2019 at 8:46 AM Jens Deppe <jensde...@apache.org> wrote:
> >
> >> To circle back to the original test failure that prompted this
> discussion -
> >> the failing test was getting intermittent bind exceptions on subsequent
> >> server restarts.
> >>
> >> I believe it's quite likely that a process' ports will remain
> unavailable
> >> even after it is gone (I'm not sure if we create listening sockets with
> >> SO_REUSEADDR). So, as to John's comment that gfsh is already
> synchronous, I
> >> don't think that adding extra functionality to gfsh, to ultimately just
> >> wait longer before exiting, is really solving the problem. I'd suggest
> you
> >> adjust the tests to always start servers with `--server-port=0` so that
> >> there are no port conflicts and let the OS handle it.
> >>
> >> --Jens
> >>
> >> On Wed, Sep 11, 2019 at 8:17 AM Bruce Schuchardt <
> bschucha...@pivotal.io>
> >> wrote:
> >>
> >>> Blocking or non-blocking, I don't have a strong opinion.  What I'd
> >>> really like to have gfsh ensure, though, is that no-one is able to
> start
> >>> a new instance of a server while the old process is still around.
> Maybe
> >>> the PID file is the way to do that.
> >>>
> >>> On 9/10/19 3:08 PM, Mark Hanson wrote:
> >>>> Hello All,
> >>>>
> >>>> I would like to propose that we make the gfsh “stop server” command
> >>> synchronous. It is causing some issues with some tests as the rest of
> the
> >>> calls are blocking. Stop on the other hand immediately returns by
> >>> comparison.
> >>>> This causes issues as shown in GEODE-7017 specifically.
> >>>>
> >>>> GEODE:7017 CI failure:
> >>> org.apache.geode.launchers.ServerStartupValueRecoveryNotificationTest >
> >>> startupReportsOnlineOnlyAfterRedundancyRestored
> >>>> https://issues.apache.org/jira/browse/GEODE-7017 <
> >>> https://issues.apache.org/jira/browse/GEODE-7017>
> >>>>
> >>>>
> >>>> What do people think?
> >>>>
> >>>> Thanks,
> >>>> Mark
> >>>
> >>
>
>

Reply via email to