[
https://issues.apache.org/jira/browse/IGNITE-11252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189757#comment-17189757
]
Denis A. Magda commented on IGNITE-11252:
-----------------------------------------
Related problem:
http://apache-ignite-users.70518.x6.nabble.com/unable-to-start-node-td33864.html
> Docs: Index corruption recovery procedure
> -----------------------------------------
>
> Key: IGNITE-11252
> URL: https://issues.apache.org/jira/browse/IGNITE-11252
> Project: Ignite
> Issue Type: Task
> Components: documentation
> Affects Versions: 2.7
> Reporter: Denis A. Magda
> Assignee: Prachi Garg
> Priority: Critical
> Fix For: 2.9
>
>
> We need to document a recovery procedure if an index corruption happens.
> Refer to this thread for details and examples of the exception dumped to the
> logs if the issue occurs:
> http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-index-corruption-issue-gt-unrecoverable-cluster-td39869.html
> # Recovering from an index corruption
> ## Applicable if
> It is known that an index of a cache is corrupted, but the main data
> (partition files and WAL) is fine. Show code snippets of possible examples.
> Find via the references shared in the dev list discussion.
> ## Steps to recover
> 1. Stop the node
> 2. Delete index.bin of the affected caches (path is
> db/<consistent_id>/cache-<cache_name>/index.bin)
> 3. Start the node
> - Note: At this point the node is active in the cluster but don’t have
> indexes.
> It means that it serves SQL queries but their performance can be low.
> Avoid running SQL queries on large tables at this point
> 4. Wait for message “Finished indexes rebuilding for cache <cache_name>” in
> the Ignite log
> # Recovering from a persistent storage corruption
> ## Applicable if
> A part of the persistent storage (partition files, checkpoint markers or WAL)
> was corrupted
> and there is no other way to recover it, but there are healthy copies of all
> data on other nodes.
> ## Steps to recover
> 1. Stop the node
> 2. Delete all persistence files of the node (best to clear Ignite working
> directory, storage directory, WAL and WAL archive directories)
> 3. Make sure consistentId is explicitly set in the configuration of the node
> - If it isn’t, lookup the generated consistentId using control.sh and set it
> explicitly in the config or via IGNITE_CONSISTENT_ID (2.8+ only)
> 4. Start the node
> 5. Wait for messages <Finished rebalancing cache> for all caches
--
This message was sent by Atlassian Jira
(v8.3.4#803005)