[
https://issues.apache.org/jira/browse/GEODE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dan Smith resolved GEODE-1088.
------------------------------
Resolution: Won't Fix
It's not possible to skip the checks on restart without risking data loss or
ConflictingData erorrs, for the reasons I've outlined in the comments.
> shutdown-all should skip member dependency checks when restarted
> ----------------------------------------------------------------
>
> Key: GEODE-1088
> URL: https://issues.apache.org/jira/browse/GEODE-1088
> Project: Geode
> Issue Type: Improvement
> Components: management
> Reporter: Soubhik Chakraborty
>
> Right now a Geode cluster when started, it waits for other members to start
> (for persistent regions only). These members are recorded when this member is
> stopped via individual stop or as part of shutdown-all.
> Because {code}shutdown-all{code} indicates the entire cluster is going down
> and if incoming traffic is stopped first, all cluster members can be
> gauranteed to be in a consistent state while its stopped. Therefore, members
> stopped cleanly using shutdown-all can skip member dependency checks while
> starting up.
> A more detailed proposition is listed in following ticket
> https://snappydata.atlassian.net/browse/SNAP-586
> I need team's help (esp. [~upthewaterspout], [~bschuchardt]) to share any
> insight, pitfalls they see in the proposition. Listing the proposed sequence
> of steps here for reference.
> There are 2 main cases we need to tackle.
> # make shutdown-all two phase (assuming all members are healthy)
> #* Phase-I ; stop network interfaces of all servers (via p2p messaging)
> #* wait for inflight operations to complete viz.
> #*# ongoing commits ? (note: due to n/w stop user will already see
> failure)
> #*# restrict new commits (n/w stopped already, so new commits won't
> arrive)
> #*# rollback existing transactions (as new commit/rollback won't come
> from user)
> #*# introduce an op counter and monitor it for zero on each member for
> non-tx operations (distribution stats counter can be used ?)
> #*# invoke disk sync procedure ?
> #* Phase-II : trigger shutdown on each of the VMs (via p2p messaging)
> #** right now during shutdown-all there are lots of chatter at jgroups
> level suspecting each other. should it be attempted to avoid ?
> #* skip member dependency check during restart by reading a recorded entry
> somewhere (data dictionary ?)
> # if one or more members are unreachable (hunged member), only way remains is
> to shutdown via script.
> #* Need to think more on how to recognize hunged members and what should be
> done before "kill -9" like record those member list.
> #* these recorded members should be started at last after starting all
> those members which did shutdown cleanly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)