This is an automated email from the ASF dual-hosted git repository.
dbarnes pushed a commit to branch support/1.13
in repository https://gitbox.apache.org/repos/asf/geode.git
The following commit(s) were added to refs/heads/support/1.13 by this push:
new 6d8d267 GEODE-9656: Document Async disk writer exit behavior (#7062)
6d8d267 is described below
commit 6d8d26795d8f4d7090a3ce20196797b005e13a37
Author: Dave Barnes <[email protected]>
AuthorDate: Thu Oct 28 16:04:24 2021 -0700
GEODE-9656: Document Async disk writer exit behavior (#7062)
---
.../disk_storage/how_disk_stores_work.html.md.erb | 2 +-
.../system_failure_and_recovery.html.md.erb | 51 ++++++++++++++++++----
2 files changed, 43 insertions(+), 10 deletions(-)
diff --git a/geode-docs/managing/disk_storage/how_disk_stores_work.html.md.erb
b/geode-docs/managing/disk_storage/how_disk_stores_work.html.md.erb
index 8d8b8e0..3328394 100644
--- a/geode-docs/managing/disk_storage/how_disk_stores_work.html.md.erb
+++ b/geode-docs/managing/disk_storage/how_disk_stores_work.html.md.erb
@@ -57,5 +57,5 @@ While a member is running, its disk stores are online. When
the member exits and
- Online, a disk store is owned and managed by its member process. To run
operations on an online disk store, use API calls in the member process, or use
the `gfsh` command-line interface.
- Offline, the disk store is just a collection of files in the host file
system. The files are accessible based on file system permissions. You can copy
the files for backup or to move the member’s disk store location. You can also
run some maintenance operations, such as file compaction and validation, by
using the `gfsh` command-line interface. When offline, the disk store's
information is unavailable to the cluster.
-For partitioned regions, region data is split between multiple members, and
therefore the start up of a member is dependent onall members, and must wait
for all members to be online. An attempt to access an entry that is stored on
disk by an offline member results in a `PartitionOfflineException`.
+For partitioned regions, region data is split between multiple members, and
therefore the start up of a member is dependent on all members, and must wait
for all members to be online. An attempt to access an entry that is stored on
disk by an offline member results in a `PartitionOfflineException`.
diff --git
a/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
b/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
index c638dfd..45a7737 100644
---
a/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
+++
b/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
@@ -23,7 +23,7 @@ This section describes alerts for and appropriate responses
to various kinds of
If a system member withdraws from the cluster involuntarily because the
member, host, or network fails, the other members automatically adapt to the
loss and continue to operate. The cluster does not experience any disturbance
such as timeouts.
-## <a id="sys_failure__section_846B00118184487FB8F1E0CD1DC3A81B"
class="no-quick-link"></a>Planning for Data Recovery
+## <a id="sys_failure__section_846B00118184487FB8F1E0CD1DC3A81B"></a>Planning
for Data Recovery
In planning a strategy for data recovery, consider these factors:
@@ -37,7 +37,7 @@ In planning a strategy for data recovery, consider these
factors:
The rest of this section provides recovery instructions for various kinds
system failures.
-## <a id="sys_failure__section_2C390F0783724048A6E12F7F369EB8DC"
class="no-quick-link"></a>Network Partitioning, Slow Response, and Member
Removal Alerts
+## <a id="sys_failure__section_2C390F0783724048A6E12F7F369EB8DC"></a>Network
Partitioning, Slow Response, and Member Removal Alerts
When a network partition detection or slow responses occur, these alerts are
generated:
@@ -49,7 +49,7 @@ When a network partition detection or slow responses occur,
these alerts are gen
For information on configuring system members to help avoid a network
partition configuration condition in the presence of a network failure or when
members lose the ability to communicate to each other, refer to [Understanding
and Recovering from Network
Outages](recovering_from_network_outages.html#rec_network_crash).
-### <a id="sys_failure__section_D52D902E665F4F038DA4B8298E3F8681"
class="no-quick-link"></a>Network Partitioning Detected
+### <a id="sys_failure__section_D52D902E665F4F038DA4B8298E3F8681"></a>Network
Partitioning Detected
Alert:
@@ -71,7 +71,7 @@ Response:
Check the network connectivity and health of the listed cache processes.
-### <a id="sys_failure__section_2C5E8A37733D4B31A12F22B9155796FD"
class="no-quick-link"></a>Member Taking Too Long to Respond
+### <a id="sys_failure__section_2C5E8A37733D4B31A12F22B9155796FD"></a>Member
Taking Too Long to Respond
Alert:
@@ -167,7 +167,7 @@ Response:
None.
-### <a id="sys_failure__section_AF4F913C244044E7A541D89EC6BCB961"
class="no-quick-link"></a>No Locators Can Be Found
+### <a id="sys_failure__section_AF4F913C244044E7A541D89EC6BCB961"></a>No
Locators Can Be Found
**Note:**
It is likely that all processes using the locators will exit with the same
message.
@@ -234,7 +234,7 @@ Response:
The operator should examine and restart the disconnected process.
-### <a id="sys_failure__section_77BDB0886A944F87BDA4C5408D9C2FC4"
class="no-quick-link"></a>Warning Notifications Before Removal
+### <a id="sys_failure__section_77BDB0886A944F87BDA4C5408D9C2FC4"></a>Warning
Notifications Before Removal
Alert:
@@ -265,7 +265,7 @@ Response:
The operator can turn this off by setting the system property
gemfire.disable-same-machine-warnings to true. However, it is best to run
locator processes, which act as membership coordinators when network partition
detection is enabled, on separate machines from cache processes.
-### <a id="sys_failure__section_E777C6EC8DEC4FE692AC5863C4420238"
class="no-quick-link"></a>Member Is Forced Out
+### <a id="sys_failure__section_E777C6EC8DEC4FE692AC5863C4420238"></a>Member
Is Forced Out
Alert:
@@ -285,7 +285,41 @@ Response:
The operator should examine the locator processes and logs.
-## How Data is Recovered From Persistent Regions
+### <a id="sys_failure__section_disk_access_exceptions"></a>Disk Access
Exceptions
+
+Alert:
+
+``` pre
+A DiskAccessException has occurred while writing to the disk for region
<region-name>.
+The cache will be closed. For Region: <region-name>: Failed writing key
+to <disk-store-name>
+```
+
+or
+
+``` pre
+A DiskAccessException has occurred while writing to the disk for region
<region-name>.
+The cache will be closed.
+For DiskStore: <disk-store-name>: Could not schedule asynchronous write because
+the flusher thread had been terminated
+```
+
+Description:
+
+A write was prevented by an underlying disk issue, such as a full disk.
+
+The first alert form is reported when disk writes are synchronous
(`disk-synchronous=true`),
+and the second form is reported when disk writes are asynchronous
(`disk-synchronous=false`).
+
+In either case, the member shuts down when an operation attempts to update the
disk store.
+
+Response:
+
+You must address the underlying disk issue and restart the server.
+See [Preventing and Recovering from Disk Full
Errors](prevent_and_recover_disk_full_errors.html) for suggestions.
+
+
+## <a id="sys_failure__section_how_data_is_recovered"></a>How Data is
Recovered From Persistent Regions
A persistent region is one whose contents (keys and values) can be restored
from disk. Upon
restart, data recovery of a persistent region always recovers keys. Under the
default behavior, the
@@ -338,4 +372,3 @@ properties allow the developer to modify the recovery
behavior for persistent re
When `true`, prolongs restart time, but ensures that when available for use,
the cache is fully
populated and data retrieval times will be optimal.
-