[geode] branch support/1.13 updated: GEODE-9656: Document Async disk writer exit behavior (#7062)

dbarnes Thu, 28 Oct 2021 16:10:39 -0700

This is an automated email from the ASF dual-hosted git repository.

dbarnes pushed a commit to branch support/1.13
in repository https://gitbox.apache.org/repos/asf/geode.git



The following commit(s) were added to refs/heads/support/1.13 by this push:
     new 6d8d267  GEODE-9656: Document Async disk writer exit behavior (#7062)
6d8d267 is described below

commit 6d8d26795d8f4d7090a3ce20196797b005e13a37
Author: Dave Barnes <[email protected]>
AuthorDate: Thu Oct 28 16:04:24 2021 -0700

    GEODE-9656: Document Async disk writer exit behavior (#7062)
---
 .../disk_storage/how_disk_stores_work.html.md.erb  |  2 +-
 .../system_failure_and_recovery.html.md.erb        | 51 ++++++++++++++++++----
 2 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/geode-docs/managing/disk_storage/how_disk_stores_work.html.md.erb 
b/geode-docs/managing/disk_storage/how_disk_stores_work.html.md.erb
index 8d8b8e0..3328394 100644
--- a/geode-docs/managing/disk_storage/how_disk_stores_work.html.md.erb
+++ b/geode-docs/managing/disk_storage/how_disk_stores_work.html.md.erb
@@ -57,5 +57,5 @@ While a member is running, its disk stores are online. When 
the member exits and
 
 -   Online, a disk store is owned and managed by its member process. To run 
operations on an online disk store, use API calls in the member process, or use 
the `gfsh` command-line interface.
 -   Offline, the disk store is just a collection of files in the host file 
system. The files are accessible based on file system permissions. You can copy 
the files for backup or to move the member’s disk store location. You can also 
run some maintenance operations, such as file compaction and validation, by 
using the `gfsh` command-line interface. When offline, the disk store's 
information is unavailable to the cluster. 
-For partitioned regions, region data is split between multiple members, and 
therefore the start up of a member is dependent onall members, and must wait 
for all members to be online. An attempt to access an entry that is stored on 
disk by an offline member results in a `PartitionOfflineException`.
+For partitioned regions, region data is split between multiple members, and 
therefore the start up of a member is dependent on all members, and must wait 
for all members to be online. An attempt to access an entry that is stored on 
disk by an offline member results in a `PartitionOfflineException`.
 
diff --git 
a/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb 
b/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
index c638dfd..45a7737 100644
--- 
a/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
+++ 
b/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
@@ -23,7 +23,7 @@ This section describes alerts for and appropriate responses 
to various kinds of
 
 If a system member withdraws from the cluster involuntarily because the 
member, host, or network fails, the other members automatically adapt to the 
loss and continue to operate. The cluster does not experience any disturbance 
such as timeouts.
 
-## <a id="sys_failure__section_846B00118184487FB8F1E0CD1DC3A81B" 
class="no-quick-link"></a>Planning for Data Recovery
+## <a id="sys_failure__section_846B00118184487FB8F1E0CD1DC3A81B"></a>Planning 
for Data Recovery
 
 In planning a strategy for data recovery, consider these factors:
 
@@ -37,7 +37,7 @@ In planning a strategy for data recovery, consider these 
factors:
 
 The rest of this section provides recovery instructions for various kinds 
system failures.
 
-## <a id="sys_failure__section_2C390F0783724048A6E12F7F369EB8DC" 
class="no-quick-link"></a>Network Partitioning, Slow Response, and Member 
Removal Alerts
+## <a id="sys_failure__section_2C390F0783724048A6E12F7F369EB8DC"></a>Network 
Partitioning, Slow Response, and Member Removal Alerts
 
 When a network partition detection or slow responses occur, these alerts are 
generated:
 
@@ -49,7 +49,7 @@ When a network partition detection or slow responses occur, 
these alerts are gen
 
 For information on configuring system members to help avoid a network 
partition configuration condition in the presence of a network failure or when 
members lose the ability to communicate to each other, refer to [Understanding 
and Recovering from Network 
Outages](recovering_from_network_outages.html#rec_network_crash).
 
-### <a id="sys_failure__section_D52D902E665F4F038DA4B8298E3F8681" 
class="no-quick-link"></a>Network Partitioning Detected
+### <a id="sys_failure__section_D52D902E665F4F038DA4B8298E3F8681"></a>Network 
Partitioning Detected
 
 Alert:
 
@@ -71,7 +71,7 @@ Response:
 
 Check the network connectivity and health of the listed cache processes.
 
-### <a id="sys_failure__section_2C5E8A37733D4B31A12F22B9155796FD" 
class="no-quick-link"></a>Member Taking Too Long to Respond
+### <a id="sys_failure__section_2C5E8A37733D4B31A12F22B9155796FD"></a>Member 
Taking Too Long to Respond
 
 Alert:
 
@@ -167,7 +167,7 @@ Response:
 
 None.
 
-### <a id="sys_failure__section_AF4F913C244044E7A541D89EC6BCB961" 
class="no-quick-link"></a>No Locators Can Be Found
+### <a id="sys_failure__section_AF4F913C244044E7A541D89EC6BCB961"></a>No 
Locators Can Be Found
 
 **Note:**
 It is likely that all processes using the locators will exit with the same 
message.
@@ -234,7 +234,7 @@ Response:
 
 The operator should examine and restart the disconnected process.
 
-### <a id="sys_failure__section_77BDB0886A944F87BDA4C5408D9C2FC4" 
class="no-quick-link"></a>Warning Notifications Before Removal
+### <a id="sys_failure__section_77BDB0886A944F87BDA4C5408D9C2FC4"></a>Warning 
Notifications Before Removal
 
 Alert:
 
@@ -265,7 +265,7 @@ Response:
 
 The operator can turn this off by setting the system property 
gemfire.disable-same-machine-warnings to true. However, it is best to run 
locator processes, which act as membership coordinators when network partition 
detection is enabled, on separate machines from cache processes.
 
-### <a id="sys_failure__section_E777C6EC8DEC4FE692AC5863C4420238" 
class="no-quick-link"></a>Member Is Forced Out
+### <a id="sys_failure__section_E777C6EC8DEC4FE692AC5863C4420238"></a>Member 
Is Forced Out
 
 Alert:
 
@@ -285,7 +285,41 @@ Response:
 
 The operator should examine the locator processes and logs.
 
-## How Data is Recovered From Persistent Regions
+### <a id="sys_failure__section_disk_access_exceptions"></a>Disk Access 
Exceptions
+
+Alert:
+
+``` pre
+A DiskAccessException has occurred while writing to the disk for region 
<region-name>.
+The cache will be closed.  For Region: <region-name>: Failed writing key
+to <disk-store-name>
+```
+
+or
+
+``` pre
+A DiskAccessException has occurred while writing to the disk for region 
<region-name>.
+The cache will be closed.
+For DiskStore: <disk-store-name>: Could not schedule asynchronous write because
+the flusher thread had been terminated
+```
+
+Description:
+
+A write was prevented by an underlying disk issue, such as a full disk.
+
+The first alert form is reported when disk writes are synchronous 
(`disk-synchronous=true`),
+and the second form is reported when disk writes are asynchronous 
(`disk-synchronous=false`).
+
+In either case, the member shuts down when an operation attempts to update the 
disk store.
+
+Response:
+
+You must address the underlying disk issue and restart the server.
+See [Preventing and Recovering from Disk Full 
Errors](prevent_and_recover_disk_full_errors.html) for suggestions.
+
+
+## <a id="sys_failure__section_how_data_is_recovered"></a>How Data is 
Recovered From Persistent Regions
 
 A persistent region is one whose contents (keys and values) can be restored 
from disk.  Upon
 restart, data recovery of a persistent region always recovers keys.  Under the 
default behavior, the
@@ -338,4 +372,3 @@ properties allow the developer to modify the recovery 
behavior for persistent re
   When `true`, prolongs restart time, but ensures that when available for use, 
the cache is fully
   populated and data retrieval times will be optimal.
 
-

[geode] branch support/1.13 updated: GEODE-9656: Document Async disk writer exit behavior (#7062)

Reply via email to