[
https://issues.apache.org/jira/browse/ATLAS-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Radhika Kundam reassigned ATLAS-4408:
-------------------------------------
Assignee: Ashutosh Mestry (was: Radhika Kundam)
> Dynamic handling of failure in updating index
> ---------------------------------------------
>
> Key: ATLAS-4408
> URL: https://issues.apache.org/jira/browse/ATLAS-4408
> Project: Atlas
> Issue Type: New Feature
> Components: atlas-core
> Affects Versions: 3.0.0, 2.2.0
> Reporter: Radhika Kundam
> Assignee: Ashutosh Mestry
> Priority: Major
> Labels: indexing
> Fix For: 3.0.0, 2.3.0
>
> Attachments: IndexRecovery.png, IndexRecovery_FunctionalFlow.png
>
>
> *Index failure resilience:* dynamic handling of failure in updating index
> (i.e. HBase commit succeeds but index commit fails).
> In case of secondary persistence failure scenario, there will be
> inconsistency with indexes for all the transactions failed at Solr. And to
> repair that, the existing option is re-indexing all the data which is time
> consuming as it involves indexing the entire database.
> To recover such inconsistencies we can use the *transaction write-ahead log
> option*. By enabling write-ahead log(tx.log-tx), JanusGraph maintains all the
> transaction log data which can be used to recover indices in case of
> failures. With this approach, it’s extra overhead to maintain the log data
> for all transactions but with this approach we can guarantee the system is
> more resilient and proactive. So advantages of this approach can nullify the
> overhead of maintaining log data.
> Design details as below.
> # Start new service - IndexRecoveryService at Atlas startup.
> ## Continuously monitor for Solr(Index Client) health for every retryTime
> millisecs
> ### If Solr is healthy and recovery start time is available,
> #### Start Transaction Recovery with available recovery start time(which is
> noted when Solr became unhealthy)
> #### Persist current recovery time as previous which can be used later by
> passing as custom recovery time to start index recovery if required.
> #### Reset current recovery start time
> #### Continue with Solr health checkup.
> ### If Solr is unhealthy and no recovery start time is available,
> #### Shutdown the existing transaction recovery process.
> #### Note down the time which should be the next recovery start time and
> persist in graph.
> #### Continue with Solr health checkup.
> Configuration properties to be used for this feature.
> 1.To enable or disable index recovery(By default index recovery will be
> enabled on Atlas startup)
> *atlas.graph.enable.index.recovery=true*
> 2.To configure how frequently SOLR health check should be done
> *atlas.graph.index.search.solr.status.retry.interval=<time in ms>*
> 3.To start index recovery by custom recovery time as user provided
> *atlas.graph.index.search.solr.recovery.start.time=1630086622*
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)