This is an automated email from the ASF dual-hosted git repository.
alexpl pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/ignite.git
The following commit(s) were added to refs/heads/master by this push:
new 6d6e261d2b4 IGNITE-26157 Enhance the documentation with information
about Maintenance Mode (#12727)
6d6e261d2b4 is described below
commit 6d6e261d2b4b353ee9ace3f488f6e9155edefe20
Author: Denis <[email protected]>
AuthorDate: Wed Mar 4 00:40:34 2026 +1000
IGNITE-26157 Enhance the documentation with information about Maintenance
Mode (#12727)
---
docs/_docs/maintenance-mode.adoc | 102 ++++++++++
.../native-persistence-defragmentation.adoc | 2 +-
docs/_docs/tools/control-script.adoc | 219 ++++++++++++++++++++-
3 files changed, 321 insertions(+), 2 deletions(-)
diff --git a/docs/_docs/maintenance-mode.adoc b/docs/_docs/maintenance-mode.adoc
new file mode 100644
index 00000000000..af9b7e40c7a
--- /dev/null
+++ b/docs/_docs/maintenance-mode.adoc
@@ -0,0 +1,102 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements. See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+= Maintenance Mode
+
+== Overview
+
+Maintenance mode is a special state of the node where its functionality is
limited.
+Nodes in this mode do not join the cluster and remain isolated until
maintenance task has been completed.
+
+Nodes can enter maintenance mode during restarts when situations that could
lead to data corruption or actions required may affect the functioning of the
cluster while the node remains part of it.
+To enter nodes into emergency mode requires a restart. More details are
provided below in the section “<<Reasons for Transitioning into Maintenance
Mode>>”
+
+When a node enters maintenance mode, it becomes isolated from the cluster and
does not receive data updates.
+Depending on the task at hand, manual intervention by an administrator might
be necessary, or the node will resolve issues automatically (for example,
repairing problems with data and indexes).
+
+After all tasks associated with maintenance mode have been completed, the
administrator must manually restart the node — after which it exits maintenance
mode.
+The node rejoins the cluster upon next restart.
+
+During operation in maintenance mode, the node is considered offline within
the cluster topology.
+Before making changes to base topology, ensure that the node is no longer in
maintenance mode.
+
+== The process of computing tasks in maintenance mode
+
+When a node receives a command to enter maintenance mode, it creates a
`maintenance_tasks.mntc` file in the working directory.
+If this file exists after a restart, the node automatically enters emergency
mode and attempts to perform the necessary tasks.
+
+Task list:
+
+[cols="1,4,1",opts="header"]
+|===
+| Task | Maintenance | Is it performed automatically at startup
+| `defragmentationMaintenanceTask` | Node defragmentation is scheduled | Yes
+
+| `indexRebuildMaintenanceTask` | Data indexes are scheduled to be restored |
Yes
+|===
+
+Once the tasks are completed, the `maintenance_tasks.mntc` file is removed.
+The node continues operating in maintenance mode until a manual restart occurs.
+
+Additionally, entering maintenance mode can also be initiated manually as
scheduled.
+For more information about this, see the section titled "<<Scheduled
Maintenance Mode>>" below.
+
+== Reasons for Transitioning into Maintenance Mode
+
+=== Possible data corruption
+
+If a node with persistence enabled and write-ahead logging disabled terminates
abnormally during checkpointing, it cannot reliably determine whether any data
corruption occurred.
+In this case the node detects possible data damage on subsequent startup and
shuts down.
+Upon the next restart, the node enters maintenance mode and waits for
administrative action.
+
+To solve the problem:
+
+- Restart the node and it will enter maintenance mode.
+- Use the management script to execute the following command to remove
potentially corrupted data:
++
+`control.sh --persistence clean corrupted`.
++
+You can also create backups using the following command:
++
+`control.sh --persistence backup corrupted`
++
+Command examples:
++
+[source, shell]
+----
+ control.sh|bat --host {host} --port {port} --persistence backup corrupted
+ control.sh|bat --host {host} --port {port} --persistence clean corrupted
+----
+A node's IP address and port can be found in its logs.
+- After completing the task, restart the node — it will resume the
checkpointing process.
+
+The node remains in maintenance mode until potentially corrupted data is
cleared.
+This deletion can be done manually followed by a node restart.
+Afterward, the node will recover lost data from backups stored on other
cluster nodes through the rebalancing process.
+More detailed information about this procedure can be found in the
"link:data-rebalancing[Data Rebalancing]".
+
+== Scheduled Maintenance Mode
+
+Some tasks require isolating the node so their execution doesn't impact the
cluster.
+Once the command is executed, the node will enter maintenance mode on the next
restart and complete the required tasks.
+Another restart will then be needed to bring the node back into the cluster.
+
+Commands that trigger maintenance mode on the next restart:
+
+- `control.sh --defragmentation` - node defragmentation;
+- `control.sh --cache schedule_indexes_rebuild` - schedule rebuilding cache
data indexes in Maintenance Mode.
+
+More details about these commands can be found in the "Control Script" section
under subsections "link:tools/control-script#defragmentation[Defragmentation]"
and "link:tools/control-script#rebuild_index[Rebuild index]".
+
+To exit maintenance mode and return the node to the cluster, restart the node.
\ No newline at end of file
diff --git a/docs/_docs/persistence/native-persistence-defragmentation.adoc
b/docs/_docs/persistence/native-persistence-defragmentation.adoc
index 24650d2631e..71787fc704a 100644
--- a/docs/_docs/persistence/native-persistence-defragmentation.adoc
+++ b/docs/_docs/persistence/native-persistence-defragmentation.adoc
@@ -39,7 +39,7 @@ To request defragmentation, use the following command:
control.(sh|bat) --defragmentation schedule --nodes <consistentIds> [--caches
<cacheNames>]
----
-After the manual restart, the node with the requested defragmentation enters a
special mode called maintenance mode. The node in maintenance mode does not
join the rest of the cluster but remains isolated until defragmentation is
completed (or canceled by explicit user request). After that, the user has to
restart the node one more time: it exits maintenance mode and returns to normal
operations (joining the cluster and starting to serve regular workload).
+After the manual restart, the node with the requested defragmentation enters a
special mode called link:maintenance-mode[Maintenance Mode]. The node in
maintenance mode does not join the rest of the cluster but remains isolated
until defragmentation is completed (or canceled by explicit user request).
After that, the user has to restart the node one more time: it exits
maintenance mode and returns to normal operations (joining the cluster and
starting to serve regular workload).
[NOTE]
====
diff --git a/docs/_docs/tools/control-script.adoc
b/docs/_docs/tools/control-script.adoc
index 18e7368fca3..bda22745541 100644
--- a/docs/_docs/tools/control-script.adoc
+++ b/docs/_docs/tools/control-script.adoc
@@ -1255,6 +1255,223 @@ TASKS SYS Running
compute tasks
Command [SYSTEM-VIEW] finished with code: 0
+== Working With Persistence Data
+
+[WARNING]
+====
+All `--persistence` commands below function exclusively in
link:maintenance-mode[Maintenance Mode]
+====
+
+=== Displaying Information About Damaged Caches
+
+Use the `--persistence info` option to display information about potentially
damaged caches in the local node:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence info
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence info
+----
+--
+
+=== Cleaning Up Damaged Caches
+
+Use the `--persistence clean corrupted` option to clear directories containing
caches with corrupted data files:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence clean corrupted
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence clean corrupted
+----
+--
+
+=== Clearing All Caches
+
+Use the `--persistence clean all` option to delete all cache directories:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence clean all
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence clean all
+----
+--
+
+=== Clearing Specific Caches
+
+Use the `--persistence clean caches` option to delete specific listed caches:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence clean caches cache1,cache2,cache3
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence clean caches cache1,cache2,cache3
+----
+--
+
+where `cache1,cache2,cache3` are comma-separated cache names.
+
+=== Backing Up Damaged Files
+
+Use the `--persistence backup corrupted` option to back up corrupted data
files:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence backup corrupted
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence backup corrupted
+----
+--
+
+=== Backing Up All Cache Files
+
+Use the `--persistence backup all` option to back up all cache data files:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence backup all
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence backup all
+----
+--
+
+=== Backing Up Specific Cache Files
+
+Use the `--persistence backup caches` option to back up specified cache data
files:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence backup caches cache1,cache2,cache3
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence backup caches cache1,cache2,cache3
+----
+--
+
+where `cache1,cache2,cache3` are comma-separated cache names.
+
+The layout these backup files are stored is
`{IGNITE_WORK_DIR}/db/{nodeId}/backup_cache-{CACHE_NAME}`.
+
+[WARNING]
+====
+Backup files created via the `./control.sh --persistence backup ...` commands
should not be regarded as snapshot recovery mechanisms.
+These are exact copies of folders containing caches intended for subsequent
analysis of the causes of corruption or attempts to recover data, but it's
important to remember that these operations require specialized knowledge, and
there are no universal methods for analyzing and recovering corrupted caches.
+====
+
+After completing all manipulations with the copied corrupted data backed up or
if they become unnecessary, they can be manually deleted from the
above-specified directory.
+
+== Defragmentation
+
+=== Scheduling Defragmentation
+
+Use the `--defragmentation schedule` option to schedule Persistent Data Store
(PDS) defragmentation:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --defragmentation schedule --nodes consistentId0,consistentId1
[--caches cache1,cache2,cache3]
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --defragmentation schedule --nodes consistentId0,consistentId1
[--caches cache1,cache2,cache3]
+----
+--
+
+As a result, the next node start-up will occur in Maintenance Mode, during
which the defragmentation will be performed automatically.
+To exit Maintenance Mode afterward, simply restart the node.
+
+=== Checking Defragmentation Status
+
+[WARNING]
+====
+Available Exclusively in link:maintenance-mode[Maintenance Mode]
+====
+
+Use the `--defragmentation status` option to retrieve the status of ongoing
defragmentation processes:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --defragmentation status
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --defragmentation status
+----
+--
+
+=== Canceling Defragmentation
+
+[WARNING]
+====
+Available Exclusively in link:maintenance-mode[Maintenance Mode]
+====
+
+Use the `--defragmentation cancel` option to cancel either a scheduled or
active Persistent Data Store (PDS) defragmentation:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --defragmentation cancel
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --defragmentation cancel
+----
+--
+
+
== Performance Statistics
Ignite provides a built-in tool for cluster profiling. Read
link:monitoring-metrics/performance-statistics[Performance Statistics] for more
information.
@@ -1405,7 +1622,7 @@ control.sh|bat --cache metrics enable --caches
cache-2,cache-1
== Rebuild index
-The `schedule_indexes_rebuild` commands Apache Ignite to rebuild indexes for
specified caches or cache groups. Target caches or cache groups must be in
Maintenance Mode.
+The `schedule_indexes_rebuild` commands Apache Ignite to rebuild indexes for
specified caches or cache groups. Target caches or cache groups must be in
link:maintenance-mode[Maintenance Mode].
[source, shell]
----