[jira] [Commented] (IGNITE-11252) Docs: Index corruption recovery procedure

Denis A. Magda (Jira) Wed, 02 Sep 2020 16:19:23 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-11252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189757#comment-17189757
 ]


Denis A. Magda commented on IGNITE-11252:
-----------------------------------------

Related problem: 
http://apache-ignite-users.70518.x6.nabble.com/unable-to-start-node-td33864.html

> Docs: Index corruption recovery procedure
> -----------------------------------------
>
>                 Key: IGNITE-11252
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11252
>             Project: Ignite
>          Issue Type: Task
>          Components: documentation
>    Affects Versions: 2.7
>            Reporter: Denis A. Magda
>            Assignee: Prachi Garg
>            Priority: Critical
>             Fix For: 2.9
>
>
> We need to document a recovery procedure if an index corruption happens. 
> Refer to this thread for details and examples of the exception dumped to the 
> logs if the issue occurs:
> http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-index-corruption-issue-gt-unrecoverable-cluster-td39869.html
> # Recovering from an index corruption
> ## Applicable if
> It is known that an index of a cache is corrupted, but the main data 
> (partition files and WAL) is fine. Show code snippets of possible examples. 
> Find via the references shared in the dev list discussion.
> ## Steps to recover
> 1. Stop the node
> 2. Delete index.bin of the affected caches (path is 
> db/<consistent_id>/cache-<cache_name>/index.bin)
> 3. Start the node
> - Note: At this point the node is active in the cluster but don’t have 
> indexes. 
> It means that it serves SQL queries but their performance can be low.
> Avoid running SQL queries on large tables at this point
> 4. Wait for message “Finished indexes rebuilding for cache <cache_name>” in 
> the Ignite log
> # Recovering from a persistent storage corruption
> ## Applicable if
> A part of the persistent storage (partition files, checkpoint markers or WAL) 
> was corrupted
> and there is no other way to recover it, but there are healthy copies of all 
> data on other nodes.
> ## Steps to recover
> 1. Stop the node
> 2. Delete all persistence files of the node (best to clear Ignite working 
> directory, storage directory, WAL and WAL archive directories)
> 3. Make sure consistentId is explicitly set in the configuration of the node
> - If it isn’t, lookup the generated consistentId using control.sh and set it 
> explicitly in the config or via IGNITE_CONSISTENT_ID (2.8+ only)
> 4. Start the node
> 5. Wait for messages <Finished rebalancing cache> for all caches



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (IGNITE-11252) Docs: Index corruption recovery procedure

Reply via email to