[ 
https://issues.apache.org/jira/browse/GEODE-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jared Stewart updated GEODE-2125:
---------------------------------
    Description: 
If the Locator is started from GFSH and the cluster's only server is killed, 
network partition detection will initiate forceDisconnect in the Locator and 
leave it in reconnect mode. To the User it will appear that the Locator crashed 
and GFSH lost connection:
{noformat}
gfsh>
No longer connected to 192.168.1.72[1099].
{noformat}
During the time in which the Locator is in reconnect mode, the User cannot 
connect via GFSH, nor can they issue status or stop commands against it:
{noformat}
$ cd locator1
$ cat vf.gf.locator.pid 
33959
$ ps 33959
  PID   TT  STAT      TIME COMMAND
33959 s001  S      0:19.97 /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Co
{noformat}
gfsh>connect --locator=localhost[10334]
Connecting to Locator at [host=localhost, port=10334] ..
Connection refused
gfsh>status locator --pid=33959
null
gfsh>status locator --dir=locator1
null
gfsh>stop locator --dir=locator1
Locator in /Users/klund/dev/geode/locator1 on null is currently not responding.
gfsh>stop locator --pid=33959
Locator in /Users/klund/dev/geode on null is currently not responding.
{noformat}
If a Locator has GFSH connected then it should notify GFSH that it is going to 
forceDisconnect and go into reconnect mode. Then GFSH can notify the User so 
the User is not suprised.

In addition, GFSH status and stop commands should be modified to be able to 
talk to a Locator in reconnect mode. GFSH start could also be modified to 
report that the Locator is running in reconnect mode instead of reporting a 
hung process in the Locator's directory.

Attachments:
* The Locator log file is attached as locator_failure-logs.txt
* The Locator thread dump (via jstack) AFTER it has shut down due to 
forceDisconnect is attached as thread_dump.txt

  was:
If the Locator is started from GFSH and the cluster's only server is killed, 
network partition detection will initiate forceDisconnect in the Locator and 
leave it in reconnect mode. To the User it will appear that the Locator crashed 
and GFSH lost connection:
{noformat}
gfsh>
No longer connected to 192.168.1.72[1099].
{noformat}
During the time in which the Locator is in reconnect mode, the User cannot 
connect via GFSH, nor can they issue status or stop commands against it:
{noformat}
$ cd locator1
$ cat vf.gf.locator.pid 
33959
$ ps 33959
  PID   TT  STAT      TIME COMMAND
33959 s001  S      0:19.97 /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Co
$ gfsh
    _________________________     __
   / _____/ ______/ ______/ /____/ /
  / /  __/ /___  /_____  / _____  / 
 / /__/ / ____/  _____/ / /    / /  
/______/_/      /______/_/    /_/    1.1.0-incubating-SNAPSHOT

Monitor and Manage Apache Geode (incubating)
gfsh>connect --locator=localhost[10334]
Connecting to Locator at [host=localhost, port=10334] ..
Connection refused
gfsh>status locator --pid=33959
null
gfsh>status locator --dir=locator1
null
gfsh>stop locator --dir=locator1
Locator in /Users/klund/dev/geode/locator1 on null is currently not responding.
gfsh>stop locator --pid=33959
Locator in /Users/klund/dev/geode on null is currently not responding.
{noformat}
If a Locator has GFSH connected then it should notify GFSH that it is going to 
forceDisconnect and go into reconnect mode. Then GFSH can notify the User so 
the User is not suprised.

In addition, GFSH status and stop commands should be modified to be able to 
talk to a Locator in reconnect mode. GFSH start could also be modified to 
report that the Locator is running in reconnect mode instead of reporting a 
hung process in the Locator's directory.

Attachments:
* The Locator log file is attached as locator_failure-logs.txt
* The Locator thread dump (via jstack) AFTER it has shut down due to 
forceDisconnect is attached as thread_dump.txt


> GFSH should provide information about Locators that go into reconnect mode
> --------------------------------------------------------------------------
>
>                 Key: GEODE-2125
>                 URL: https://issues.apache.org/jira/browse/GEODE-2125
>             Project: Geode
>          Issue Type: Improvement
>          Components: management
>    Affects Versions: 1.0.0-incubating
>            Reporter: Kirk Lund
>            Assignee: Kirk Lund
>         Attachments: locator_failure-logs.txt, thread_dump.txt
>
>
> If the Locator is started from GFSH and the cluster's only server is killed, 
> network partition detection will initiate forceDisconnect in the Locator and 
> leave it in reconnect mode. To the User it will appear that the Locator 
> crashed and GFSH lost connection:
> {noformat}
> gfsh>
> No longer connected to 192.168.1.72[1099].
> {noformat}
> During the time in which the Locator is in reconnect mode, the User cannot 
> connect via GFSH, nor can they issue status or stop commands against it:
> {noformat}
> $ cd locator1
> $ cat vf.gf.locator.pid 
> 33959
> $ ps 33959
>   PID   TT  STAT      TIME COMMAND
> 33959 s001  S      0:19.97 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Co
> {noformat}
> gfsh>connect --locator=localhost[10334]
> Connecting to Locator at [host=localhost, port=10334] ..
> Connection refused
> gfsh>status locator --pid=33959
> null
> gfsh>status locator --dir=locator1
> null
> gfsh>stop locator --dir=locator1
> Locator in /Users/klund/dev/geode/locator1 on null is currently not 
> responding.
> gfsh>stop locator --pid=33959
> Locator in /Users/klund/dev/geode on null is currently not responding.
> {noformat}
> If a Locator has GFSH connected then it should notify GFSH that it is going 
> to forceDisconnect and go into reconnect mode. Then GFSH can notify the User 
> so the User is not suprised.
> In addition, GFSH status and stop commands should be modified to be able to 
> talk to a Locator in reconnect mode. GFSH start could also be modified to 
> report that the Locator is running in reconnect mode instead of reporting a 
> hung process in the Locator's directory.
> Attachments:
> * The Locator log file is attached as locator_failure-logs.txt
> * The Locator thread dump (via jstack) AFTER it has shut down due to 
> forceDisconnect is attached as thread_dump.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to