[1/2] incubator-geode git commit: GEODE-2047 Document change to enable-network-partition-detection

kmiller Tue, 01 Nov 2016 13:54:25 -0700

Repository: incubator-geode
Updated Branches:
  refs/heads/develop 3bdd10497 -> 3822c9053



GEODE-2047 Document change to enable-network-partition-detection

- This is a subtask of GEODE-762.
- The default value of property enable-network-partition-detection
changed from false to true, enabling partition detection by
default, so all documentation that discusses partition detection
also needs to change.
- Fixed a minor typo or two encountered in the files that were
being updated.


Project: http://git-wip-us.apache.org/repos/asf/incubator-geode/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-geode/commit/8f14a744
Tree: http://git-wip-us.apache.org/repos/asf/incubator-geode/tree/8f14a744
Diff: http://git-wip-us.apache.org/repos/asf/incubator-geode/diff/8f14a744

Branch: refs/heads/develop
Commit: 8f14a744c6bc51c422e4f292dc67219f740dc7ba
Parents: 820f33e
Author: Karen Miller <[email protected]>
Authored: Mon Oct 31 16:45:29 2016 -0700
Committer: Karen Miller <[email protected]>
Committed: Tue Nov 1 13:52:22 2016 -0700

----------------------------------------------------------------------
 .../handling_network_partitioning.html.md.erb   | 28 +++++++++++---------
 ...rk_partitioning_management_works.html.md.erb |  7 +++--
 ...ring_conflicting_data_exceptions.html.md.erb |  4 +--
 .../recovering_from_network_outages.html.md.erb | 11 ++------
 .../system_failure_and_recovery.html.md.erb     |  6 ++---
 .../topics/gemfire_properties.html.md.erb       |  4 +--
 6 files changed, 27 insertions(+), 33 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/8f14a744/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb
----------------------------------------------------------------------
diff --git 
a/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb
 
b/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb
index 61a2576..a227597 100644
--- 
a/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb
+++ 
b/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb
@@ -19,23 +19,24 @@ See the License for the specific language governing 
permissions and
 limitations under the License.
 -->
 
-This section lists the configuration steps for network partition detection.
+This section lists configuration considerations relating to network partition 
detection.
 
 <a 
id="handling_network_partitioning__section_EAF1957B6446491A938DEFB06481740F"></a>
 The system uses a combination of member coordinators and system members, 
designated as lead members, to detect and resolve network partitioning problems.
 
-1.  Network partition detection works in all environments. Using multiple 
locators mitigates the effect of network partitioning. See [Configuring 
Peer-to-Peer 
Discovery](../../topologies_and_comm/p2p_configuration/setting_up_a_p2p_system.html).
-2.  Enable partition detection consistently in all system members by setting 
this in their `gemfire.properties` file:
+-   Network partition detection works in all environments. Using multiple 
locators mitigates the effect of network partitioning. See [Configuring 
Peer-to-Peer 
Discovery](../../topologies_and_comm/p2p_configuration/setting_up_a_p2p_system.html).
+
+-   Network partition detection is enabled by default. The default setting in 
the `gemfire.properties` file is
 
     ``` pre
     enable-network-partition-detection=true
     ```
 
-    Enable network partition detection in all locators and in any other 
process that should be sensitive to network partitioning. Processes that do not 
have network partition detection enabled are not eligible to be the lead 
member, so their failure will not trigger declaration of a network partition.
+    Processes that do not have network partition detection enabled are not 
eligible to be the lead member, so their failure will not trigger declaration 
of a network partition.
 
-    All system members should have the same setting for 
`enable-network-partition-detection`. If they donât, the system throws a 
`GemFireConfigException` upon startup.
+    All system members should have the same setting for 
`enable-network-partition-detection`. If they do not, the system throws a 
`GemFireConfigException` upon startup.
 
-3.  You must set `enable-network-partition-detection` to true if you are using 
persistent partitioned regions. You **must** set 
`enable-network-partition-detection` to true if you are using persistent 
regions (partitioned or replicated). If you create a persistent region and 
`enable-network-partition-detection` to set to false, you will receive the 
following warning message:
+-   The property `enable-network-partition-detection` must be true if you are 
using either partitioned or persistent regions. If you create a persistent 
region and `enable-network-partition-detection` to set to false, you will 
receive the following warning message:
 
     ``` pre
     Creating persistent region {0}, but enable-network-partition-detection is 
set to false.
@@ -43,9 +44,9 @@ The system uses a combination of member coordinators and 
system members, designa
           event of a network split."
     ```
 
-4.  Configure regions you want to protect from network partitioning with 
`DISTRIBUTED_ACK` or `GLOBAL` `scope`. Do not use `DISTRIBUTED_NO_ACK` `scope`. 
The region configurations provided in the region shortcut settings use 
`DISTRIBUTED_ACK` scope. This setting prevents operations from performed 
throughout the distributed system before a network partition is detected.
+-   Configure regions you want to protect from network partitioning with a 
scope setting of `DISTRIBUTED_ACK` or `GLOBAL`. Do not use `DISTRIBUTED_NO_ACK` 
scope. This prevents operations from being performed throughout the distributed 
system before a network partition is detected.
     **Note:**
-    GemFire issues an alert if it detects distributed-no-ack regions when 
network partition detection is enabled:
+    GemFire issues an alert if it detects `DISTRIBUTED_NO_ACK` regions when 
network partition detection is enabled:
 
     ``` pre
     Region {0} is being created with scope {1} but 
enable-network-partition-detection is enabled in the distributed system. 
@@ -53,11 +54,12 @@ The system uses a combination of member coordinators and 
system members, designa
                                 
     ```
 
-5.  These other configuration parameters affect or interact with network 
partitioning detection. Check whether they are appropriate for your 
installation and modify as needed.
-    -   If you have network partition detection enabled, the threshold 
percentage value for allowed membership weight loss is automatically configured 
to 51. You cannot modify this value. (**Note:** The weight loss calculation 
uses standard rounding. Therefore, a value of 50.51 is rounded to 51 and will 
cause a network partition.)
-    -   Failure detection is initiated if a member's `gemfire.properties` 
`ack-wait-threshold` (default is 15 seconds) and `ack-severe-alert-threshold` 
(15 seconds) elapses before receiving a response to a message. If you modify 
the `ack-wait-threshold` configuration value, you should modify 
`ack-severe-alert-threshold` to match the other configuration value.
-    -   If the system has clients connecting to it, the clients' `cache.xml` 
`<cache> <pool> read-timeout` should be set to at least three times the 
`member-timeout` setting in the server's `gemfire.properties`. The default 
`<cache> <pool> read-timeout` setting is 10000 milliseconds.
+-   These other configuration parameters affect or interact with network 
partitioning detection. Check whether they are appropriate for your 
installation and modify as needed.
+    -   If you have network partition detection enabled, the threshold 
percentage value for allowed membership weight loss is automatically configured 
to 51. You cannot modify this value. **Note:** The weight loss calculation uses 
round to nearest. Therefore, a value of 50.51 is rounded to 51 and will cause a 
network partition.
+    -   Failure detection is initiated if a member's `ack-wait-threshold` 
(default is 15 seconds) and `ack-severe-alert-threshold` (15 seconds) 
properties elapse before receiving a response to a message. If you modify the 
`ack-wait-threshold` configuration value, you should modify 
`ack-severe-alert-threshold` to match the other configuration value.
+    -   If the system has clients connecting to it, the clients' `cache.xml` 
pool `read-timeout` should be set to at least three times the `member-timeout` 
setting in the server's `gemfire.properties` file. The default pool 
`read-timeout` setting is 10000 milliseconds.
     -   You can adjust the default weights of members by specifying the system 
property `gemfire.member-weight` upon startup. For example, if you have some 
VMs that host a needed service, you could assign them a higher weight upon 
startup.
-    -   By default, members that are forced out of the distributed system by a 
network partition event will automatically restart and attempt to reconnect. 
Data members will attempt to reinitialize the cache. See [Handling Forced Cache 
Disconnection Using Autoreconnect](../autoreconnect/member-reconnect.html).
+
+-   By default, members that are forced out of the distributed system by a 
network partition event will automatically restart and attempt to reconnect. 
Data members will attempt to reinitialize the cache. See [Handling Forced Cache 
Disconnection Using Autoreconnect](../autoreconnect/member-reconnect.html).
 
 

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/8f14a744/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb
----------------------------------------------------------------------
diff --git 
a/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb
 
b/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb
index e971634..93a14ac 100644
--- 
a/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb
+++ 
b/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb
@@ -24,10 +24,9 @@ Geode handles network outages by using a weighting system to 
determine whether t
 <a 
id="how_network_partitioning_management_works__section_548146BB8C24412CB7B43E6640272882"></a>
 Individual members are each assigned a weight, and the quorum is determined by 
comparing the total weight of currently responsive members to the previous 
total weight of responsive members.
 
-Your distributed system can split into separate running systems when members 
lose the ability to see each other. The typical cause of this problem is a 
failure in the network. When a partitioned system is detected, Apache Geode 
only one side of the system keeps running and the other side automatically 
shuts down.
+Your distributed system can split into separate running systems when members 
lose the ability to see each other. The typical cause of this problem is a 
failure in the network. When a partitioned system is detected, only one side of 
the system keeps running and the other side automatically shuts down.
 
-**Note:**
-The network partitioning detection feature is only enabled when 
`enable-network-partition-detection` is set to true in `gemfire.properties`. By 
default, this property is set to false. See [Configure Apache Geode to Handle 
Network 
Partitioning](handling_network_partitioning.html#handling_network_partitioning) 
for details. Quorum weight calculations are always performed and logged 
regardless of this configuration setting.
+The network partitioning detection feature is enabled by default with a true 
value for the `enable-network-partition-detection` property. See [Configure 
Apache Geode to Handle Network 
Partitioning](handling_network_partitioning.html#handling_network_partitioning) 
for details. Quorum weight calculations are always performed and logged 
regardless of this configuration setting.
 
 The overall process for detecting a network partition is as follows:
 
@@ -52,7 +51,7 @@ The overall process for detecting a network partition is as 
follows:
     -   A new coordinator may have a stale view of membership if it did not 
see the last membership view sent by the previous (failed) coordinator. If new 
members were added during that failure, then the new members may be ignored 
when the first new view is sent out.
     -   If members were removed during the fail over to the new coordinator, 
then the new coordinator will have to determine these losses during the view 
preparation step.
 
-6.  With `enable-network-partition-detection` set to true, any member that 
detects that the total membership weight has dropped below 51% within a single 
membership view change (loss of quorum) declares a network partition event. The 
coordinator sends a network-partitioned-detected UDP message to all members 
(even to the non-responsive ones) and then closes the distributed system with a 
`ForcedDisconnectException`. If a member fails to receive the message before 
the coordinator closes the system, the member is responsible for detecting the 
event on its own.
+6.  With a default value of `enable-network-partition-detection`, any member 
that detects that the total membership weight has dropped below 51% within a 
single membership view change (loss of quorum) declares a network partition 
event. The coordinator sends a network-partitioned-detected UDP message to all 
members (even to the non-responsive ones) and then closes the distributed 
system with a `ForcedDisconnectException`. If a member fails to receive the 
message before the coordinator closes the system, the member is responsible for 
detecting the event on its own.
 
 The presumption is that when a network partition is declared, the members that 
comprise a quorum will continue operations. The surviving members elect a new 
coordinator, designate a lead member, and so on.
 

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/8f14a744/geode-docs/managing/troubleshooting/recovering_conflicting_data_exceptions.html.md.erb
----------------------------------------------------------------------
diff --git 
a/geode-docs/managing/troubleshooting/recovering_conflicting_data_exceptions.html.md.erb
 
b/geode-docs/managing/troubleshooting/recovering_conflicting_data_exceptions.html.md.erb
index 38375ae..4eade62 100644
--- 
a/geode-docs/managing/troubleshooting/recovering_conflicting_data_exceptions.html.md.erb
+++ 
b/geode-docs/managing/troubleshooting/recovering_conflicting_data_exceptions.html.md.erb
@@ -46,7 +46,7 @@ In this case the fix is simply to move aside or delete the 
persistent files for
 
 ## A Network Failure Occurs and Network Partitioning Detection is Disabled
 
-When `enable-network-partition-detection` is set to true, Geode will detect a 
network partition and shut down unreachable members to prevent a network 
partition ("split brain") from occurring. No conflicts should occur when the 
system is healed.
+When `enable-network-partition-detection` is set to the default value of true, 
Geode will detect a network partition and shut down unreachable members to 
prevent a network partition ("split brain") from occurring. No conflicts should 
occur when the system is healed.
 
 However if `enable-network-partition-detection` is false, Geode will not 
detect the network partition. Instead, each side of the network partition will 
end up recording that the other side of the partition has stale data. When the 
partition is healed and persistent members are restarted, the members will 
report a conflict because both sides of the partition think the other members 
are stale.
 
@@ -54,7 +54,7 @@ In some cases it may be possible to choose between sides of 
the network partitio
 
 ## Salvaging Data
 
-If you receive a ConflictingPersistentDataException, you will not be able to 
start all of your members and have them join the same distributed system. You 
have some members with conflicting data.
+If you receive a `ConflictingPersistentDataException`, you will not be able to 
start all of your members and have them join the same distributed system. You 
have some members with conflicting data.
 
 First, see if there is part of the system that you can recover. For example if 
you just added some new members to the system, try to start up without 
including those members.
 

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/8f14a744/geode-docs/managing/troubleshooting/recovering_from_network_outages.html.md.erb
----------------------------------------------------------------------
diff --git 
a/geode-docs/managing/troubleshooting/recovering_from_network_outages.html.md.erb
 
b/geode-docs/managing/troubleshooting/recovering_from_network_outages.html.md.erb
index 8c23bea..f798b2b 100644
--- 
a/geode-docs/managing/troubleshooting/recovering_from_network_outages.html.md.erb
+++ 
b/geode-docs/managing/troubleshooting/recovering_from_network_outages.html.md.erb
@@ -23,16 +23,9 @@ The safest response to a network outage is to restart all 
the processes and brin
 
 However, if you know the architecture of your system well, and you are sure 
you wonât be resurrecting old data, you can do a selective restart. At the 
very least, you must restart all the members on one side of the network 
failure, because a network outage causes separate distributed systems that 
canât rejoin automatically.
 
--   [What Happens During a Network 
Outage](recovering_from_network_outages.html#rec_network_crash__section_900657018DC048EE9BE6A8064FAE48FD)
--   [Recovery 
Procedure](recovering_from_network_outages.html#rec_network_crash__section_F9A0C31AE25C4E7185DF3B1A8486BDFA)
--   [Effect of Network Failure on Partitioned 
Regions](recovering_from_network_outages.html#rec_network_crash__section_9914A63673E64EA1ADB6B6767879F0FF)
--   [Effect of Network Failure on Distributed 
Regions](recovering_from_network_outages.html#rec_network_crash__section_7AD5624F3CD748C0BC163562B26B2DCE)
--   [Effect of Network Failure on Persistent 
Regions](#rec_network_crash__section_arm_pnr_3q)
--   [Effect of Network Failure on Client/Server 
Installations](recovering_from_network_outages.html#rec_network_crash__section_18AEEB6CC8004C3388CCB01F988B0422)
-
 ## <a id="rec_network_crash__section_900657018DC048EE9BE6A8064FAE48FD" 
class="no-quick-link"></a>What Happens During a Network Outage
 
-When the network connecting members of a distributed system goes down, system 
members treat this like a machine crash. Members on each side of the network 
failure respond by removing the members on the other side from the membership 
list. If network partitioning detection is enabled, the partition that contains 
sufficient quorum (&gt; 51% based on member weight) will continue to operate, 
while the other partition with insufficient quorum will shut down. See [Network 
Partitioning](../network_partitioning/chapter_overview.html#network_partitioning)
 for a detailed explanation on how this detection system operates.
+When the network connecting members of a distributed system goes down, system 
members treat this like a machine crash. Members on each side of the network 
failure respond by removing the members on the other side from the membership 
list. If network partitioning detection is enabled (the default), the partition 
that contains sufficient quorum (&gt; 51% based on member weight) will continue 
to operate, while the other partition with insufficient quorum will shut down. 
See [Network 
Partitioning](../network_partitioning/chapter_overview.html#network_partitioning)
 for a detailed explanation on how this detection system operates.
 
 In addition, members that have been disconnected either via network partition 
or due to unresponsiveness will automatically try to reconnect to the 
distributed system unless configured otherwise. See [Handling Forced Cache 
Disconnection Using Autoreconnect](../autoreconnect/member-reconnect.html).
 
@@ -62,7 +55,7 @@ When the network recovers, the members may be able to see 
each other again, but
 
 A network failure when using persistent regions can cause conflicts in your 
persisted data. When you recover your system, you will likely encounter 
`ConflictingPersistentDataException`s when members start up.
 
-For this reason, you must configure `enable-network-partition-detection` to 
`true` if you are using persistent regions.
+For this reason, `enable-network-partition-detection` must be set to true if 
you are using persistent regions.
 
 For information on how to recover from `ConflictingPersistentDataException` 
errors should they occur, see [Recovering from 
ConfictingPersistentDataExceptions](recovering_conflicting_data_exceptions.html#topic_ghw_z2m_jq).
 

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/8f14a744/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
----------------------------------------------------------------------
diff --git 
a/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb 
b/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
index d94ea60..cce80d0 100644
--- 
a/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
+++ 
b/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
@@ -181,7 +181,7 @@ There are no processes eligible to be group membership 
coordinator
 
 Description:
 
-Network partition detection is enabled (enable-network-partition-detection is 
set to true), and there are locator problems.
+Network partition detection is enabled, and there are locator problems.
 
 Response:
 
@@ -197,7 +197,7 @@ There are no processes eligible to be group membership 
coordinator
 
 Description:
 
-Network partition detection is enabled (enable-network-partition-detection is 
set to true), and there are locator problems.
+Network partition detection is enabled, and there are locator problems.
 
 Response:
 
@@ -212,7 +212,7 @@ Unable to contact any locators and network partition 
detection is enabled
 
 Description:
 
-Network partition detection is enabled (enable-network-partition-detection is 
set to true), and there are locator problems.
+Network partition detection is enabled, and there are locator problems.
 
 Response:
 

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/8f14a744/geode-docs/reference/topics/gemfire_properties.html.md.erb
----------------------------------------------------------------------
diff --git a/geode-docs/reference/topics/gemfire_properties.html.md.erb 
b/geode-docs/reference/topics/gemfire_properties.html.md.erb
index 9882568..ae0f198 100644
--- a/geode-docs/reference/topics/gemfire_properties.html.md.erb
+++ b/geode-docs/reference/topics/gemfire_properties.html.md.erb
@@ -160,8 +160,8 @@ See <a 
href="../../managing/autoreconnect/member-reconnect.html">Handling Forced
 </tr>
 <tr class="odd">
 <td>enable-network-partition-detection</td>
-<td>Boolean instructing the system to detect and handle splits in the 
distributed system, typically caused by a partitioning of the network (split 
brain) where the distributed system is running. We recommend setting this 
property to <code class="ph codeph">true</code>. You must set this property to 
the same value across all your distributed system members. In addition, you 
must set this property to <code class="ph codeph">true</code> if you are using 
persistent regions and configure your regions to use DISTRIBUTED_ACK or GLOBAL 
scope to avoid potential data conflicts.</td>
-<td>false</td>
+<td>Boolean instructing the system to detect and handle splits in the 
distributed system, typically caused by a partitioning of the network (split 
brain) where the distributed system is running. You must set this property to 
the same value across all your distributed system members. In addition, this 
property must be set to <code class="ph codeph">true</code> if you are using 
persistent regions and configure your regions to use DISTRIBUTED_ACK or GLOBAL 
scope to avoid potential data conflicts.</td>
+<td>true</td>
 </tr>
 <tr class="even">
 <td>enable-cluster-configuration</td>

[1/2] incubator-geode git commit: GEODE-2047 Document change to enable-network-partition-detection

Reply via email to