(ignite) branch master updated: IGNITE-26157 Enhance the documentation with information about Maintenance Mode (#12727)

alexpl Tue, 03 Mar 2026 06:40:52 -0800

This is an automated email from the ASF dual-hosted git repository.

alexpl pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/ignite.git



The following commit(s) were added to refs/heads/master by this push:
     new 6d6e261d2b4 IGNITE-26157 Enhance the documentation with information 
about Maintenance Mode (#12727)
6d6e261d2b4 is described below

commit 6d6e261d2b4b353ee9ace3f488f6e9155edefe20
Author: Denis <[email protected]>
AuthorDate: Wed Mar 4 00:40:34 2026 +1000

    IGNITE-26157 Enhance the documentation with information about Maintenance 
Mode (#12727)
---
 docs/_docs/maintenance-mode.adoc                   | 102 ++++++++++
 .../native-persistence-defragmentation.adoc        |   2 +-
 docs/_docs/tools/control-script.adoc               | 219 ++++++++++++++++++++-
 3 files changed, 321 insertions(+), 2 deletions(-)

diff --git a/docs/_docs/maintenance-mode.adoc b/docs/_docs/maintenance-mode.adoc
new file mode 100644
index 00000000000..af9b7e40c7a
--- /dev/null
+++ b/docs/_docs/maintenance-mode.adoc
@@ -0,0 +1,102 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+= Maintenance Mode
+
+== Overview
+
+Maintenance mode is a special state of the node where its functionality is 
limited.
+Nodes in this mode do not join the cluster and remain isolated until 
maintenance task has been completed.
+
+Nodes can enter maintenance mode during restarts when situations that could 
lead to data corruption or actions required may affect the functioning of the 
cluster while the node remains part of it.
+To enter nodes into emergency mode requires a restart. More details are 
provided below in the section “<<Reasons for Transitioning into Maintenance 
Mode>>”
+
+When a node enters maintenance mode, it becomes isolated from the cluster and 
does not receive data updates.
+Depending on the task at hand, manual intervention by an administrator might 
be necessary, or the node will resolve issues automatically (for example, 
repairing problems with data and indexes).
+
+After all tasks associated with maintenance mode have been completed, the 
administrator must manually restart the node — after which it exits maintenance 
mode.
+The node rejoins the cluster upon next restart.
+
+During operation in maintenance mode, the node is considered offline within 
the cluster topology.
+Before making changes to base topology, ensure that the node is no longer in 
maintenance mode.
+
+== The process of computing tasks in maintenance mode
+
+When a node receives a command to enter maintenance mode, it creates a 
`maintenance_tasks.mntc` file in the working directory.
+If this file exists after a restart, the node automatically enters emergency 
mode and attempts to perform the necessary tasks.
+
+Task list:
+
+[cols="1,4,1",opts="header"]
+|===
+| Task | Maintenance  | Is it performed automatically at startup
+| `defragmentationMaintenanceTask` | Node defragmentation is scheduled | Yes
+
+| `indexRebuildMaintenanceTask` | Data indexes are scheduled to be restored | 
Yes
+|===
+
+Once the tasks are completed, the `maintenance_tasks.mntc` file is removed.
+The node continues operating in maintenance mode until a manual restart occurs.
+
+Additionally, entering maintenance mode can also be initiated manually as 
scheduled.
+For more information about this, see the section titled "<<Scheduled 
Maintenance Mode>>" below.
+
+== Reasons for Transitioning into Maintenance Mode
+
+=== Possible data corruption
+
+If a node with persistence enabled and write-ahead logging disabled terminates 
abnormally during checkpointing, it cannot reliably determine whether any data 
corruption occurred.
+In this case the node detects possible data damage on subsequent startup and 
shuts down.
+Upon the next restart, the node enters maintenance mode and waits for 
administrative action.
+
+To solve the problem:
+
+- Restart the node and it will enter maintenance mode.
+- Use the management script to execute the following command to remove 
potentially corrupted data:
++
+`control.sh --persistence clean corrupted`.
++
+You can also create backups using the following command:
++
+`control.sh --persistence backup corrupted`
++
+Command examples:
++
+[source, shell]
+----
+ control.sh|bat --host {host} --port {port} --persistence backup corrupted
+ control.sh|bat --host {host} --port {port} --persistence clean corrupted
+----
+A node's IP address and port can be found in its logs.
+- After completing the task, restart the node — it will resume the 
checkpointing process.
+
+The node remains in maintenance mode until potentially corrupted data is 
cleared.
+This deletion can be done manually followed by a node restart.
+Afterward, the node will recover lost data from backups stored on other 
cluster nodes through the rebalancing process.
+More detailed information about this procedure can be found in the 
"link:data-rebalancing[Data Rebalancing]".
+
+== Scheduled Maintenance Mode
+
+Some tasks require isolating the node so their execution doesn't impact the 
cluster.
+Once the command is executed, the node will enter maintenance mode on the next 
restart and complete the required tasks.
+Another restart will then be needed to bring the node back into the cluster.
+
+Commands that trigger maintenance mode on the next restart:
+
+- `control.sh --defragmentation` - node defragmentation;
+- `control.sh --cache schedule_indexes_rebuild` - schedule rebuilding cache 
data indexes in Maintenance Mode.
+
+More details about these commands can be found in the "Control Script" section 
under subsections "link:tools/control-script#defragmentation[Defragmentation]" 
and "link:tools/control-script#rebuild_index[Rebuild index]".
+
+To exit maintenance mode and return the node to the cluster, restart the node.
\ No newline at end of file
diff --git a/docs/_docs/persistence/native-persistence-defragmentation.adoc 
b/docs/_docs/persistence/native-persistence-defragmentation.adoc
index 24650d2631e..71787fc704a 100644
--- a/docs/_docs/persistence/native-persistence-defragmentation.adoc
+++ b/docs/_docs/persistence/native-persistence-defragmentation.adoc
@@ -39,7 +39,7 @@ To request defragmentation, use the following command:
 control.(sh|bat) --defragmentation schedule --nodes <consistentIds> [--caches 
<cacheNames>]
 ----
 
-After the manual restart, the node with the requested defragmentation enters a 
special mode called maintenance mode. The node in maintenance mode does not 
join the rest of the cluster but remains isolated until defragmentation is 
completed (or canceled by explicit user request). After that, the user has to 
restart the node one more time: it exits maintenance mode and returns to normal 
operations (joining the cluster and starting to serve regular workload).
+After the manual restart, the node with the requested defragmentation enters a 
special mode called link:maintenance-mode[Maintenance Mode]. The node in 
maintenance mode does not join the rest of the cluster but remains isolated 
until defragmentation is completed (or canceled by explicit user request). 
After that, the user has to restart the node one more time: it exits 
maintenance mode and returns to normal operations (joining the cluster and 
starting to serve regular workload).
 
 [NOTE]
 ====
diff --git a/docs/_docs/tools/control-script.adoc 
b/docs/_docs/tools/control-script.adoc
index 18e7368fca3..bda22745541 100644
--- a/docs/_docs/tools/control-script.adoc
+++ b/docs/_docs/tools/control-script.adoc
@@ -1255,6 +1255,223 @@ TASKS                          SYS       Running 
compute tasks
 Command [SYSTEM-VIEW] finished with code: 0
 
 
+== Working With Persistence Data
+
+[WARNING]
+====
+All `--persistence` commands below function exclusively in 
link:maintenance-mode[Maintenance Mode]
+====
+
+=== Displaying Information About Damaged Caches
+
+Use the `--persistence info` option to display information about potentially 
damaged caches in the local node:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence info
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence info
+----
+--
+
+=== Cleaning Up Damaged Caches
+
+Use the `--persistence clean corrupted` option to clear directories containing 
caches with corrupted data files:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence clean corrupted
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence clean corrupted
+----
+--
+
+=== Clearing All Caches
+
+Use the `--persistence clean all` option to delete all cache directories:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence clean all
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence clean all
+----
+--
+
+=== Clearing Specific Caches
+
+Use the `--persistence clean caches` option to delete specific listed caches:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence clean caches cache1,cache2,cache3
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence clean caches cache1,cache2,cache3
+----
+--
+
+where `cache1,cache2,cache3` are comma-separated cache names.
+
+=== Backing Up Damaged Files
+
+Use the `--persistence backup corrupted` option to back up corrupted data 
files:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence backup corrupted
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence backup corrupted
+----
+--
+
+=== Backing Up All Cache Files
+
+Use the `--persistence backup all` option to back up all cache data files:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence backup all
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence backup all
+----
+--
+
+=== Backing Up Specific Cache Files
+
+Use the `--persistence backup caches` option to back up specified cache data 
files:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --persistence backup caches cache1,cache2,cache3
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --persistence backup caches cache1,cache2,cache3
+----
+--
+
+where `cache1,cache2,cache3` are comma-separated cache names.
+
+The layout these backup files are stored is 
`{IGNITE_WORK_DIR}/db/{nodeId}/backup_cache-{CACHE_NAME}`.
+
+[WARNING]
+====
+Backup files created via the `./control.sh --persistence backup ...` commands 
should not be regarded as snapshot recovery mechanisms.
+These are exact copies of folders containing caches intended for subsequent 
analysis of the causes of corruption or attempts to recover data, but it's 
important to remember that these operations require specialized knowledge, and 
there are no universal methods for analyzing and recovering corrupted caches.
+====
+
+After completing all manipulations with the copied corrupted data backed up or 
if they become unnecessary, they can be manually deleted from the 
above-specified directory.
+
+== Defragmentation
+
+=== Scheduling Defragmentation
+
+Use the `--defragmentation schedule` option to schedule Persistent Data Store 
(PDS) defragmentation:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --defragmentation schedule --nodes consistentId0,consistentId1 
[--caches cache1,cache2,cache3]
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --defragmentation schedule --nodes consistentId0,consistentId1 
[--caches cache1,cache2,cache3]
+----
+--
+
+As a result, the next node start-up will occur in Maintenance Mode, during 
which the defragmentation will be performed automatically.
+To exit Maintenance Mode afterward, simply restart the node.
+
+=== Checking Defragmentation Status
+
+[WARNING]
+====
+Available Exclusively in link:maintenance-mode[Maintenance Mode]
+====
+
+Use the `--defragmentation status` option to retrieve the status of ongoing 
defragmentation processes:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --defragmentation status
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --defragmentation status
+----
+--
+
+=== Canceling Defragmentation
+
+[WARNING]
+====
+Available Exclusively in link:maintenance-mode[Maintenance Mode]
+====
+
+Use the `--defragmentation cancel` option to cancel either a scheduled or 
active Persistent Data Store (PDS) defragmentation:
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --defragmentation cancel
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --defragmentation cancel
+----
+--
+
+
 == Performance Statistics
 
 Ignite provides a built-in tool for cluster profiling. Read 
link:monitoring-metrics/performance-statistics[Performance Statistics] for more 
information.
@@ -1405,7 +1622,7 @@ control.sh|bat --cache metrics enable --caches 
cache-2,cache-1
 
 == Rebuild index
 
-The `schedule_indexes_rebuild` commands Apache Ignite to rebuild indexes for 
specified caches or cache groups. Target caches or cache groups must be in 
Maintenance Mode.
+The `schedule_indexes_rebuild` commands Apache Ignite to rebuild indexes for 
specified caches or cache groups. Target caches or cache groups must be in 
link:maintenance-mode[Maintenance Mode].
 
 [source, shell]
 ----

(ignite) branch master updated: IGNITE-26157 Enhance the documentation with information about Maintenance Mode (#12727)

Reply via email to