[
https://issues.apache.org/jira/browse/ATLAS-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Radhika Kundam updated ATLAS-4408:
----------------------------------
Description:
*Index failure resilience:* dynamic handling of failure in updating index (i.e.
HBase commit succeeds but index commit fails).
Design details as below.
# Start new service - IndexRecoveryService at Atlas startup.
## Continuously monitor for Solr(Index Client) health for every retryTime
millisecs
### If Solr is healthy and recovery start time is available,
#### Start Transaction Recovery with available recovery start time(which is
noted when Solr became unhealthy)
#### Persist current recovery time as previous which can be used later by
passing as custom recovery time to start index recovery if required.
#### Reset current recovery start time
#### Continue with Solr health checkup.
### If Solr is unhealthy and no recovery start time is available,
#### Shutdown the existing transaction recovery process.
#### Note down the time which should be the next recovery start time and
persist in graph.
#### Continue with Solr health checkup.
Configuration properties to be used for this feature.
1.To enable or disable index recovery(By default index recovery will be enabled
on Atlas startup)
*atlas.graph.enable.index.recovery=true*
2.To configure how frequently SOLR health check should be done
*atlas.graph.index.search.solr.status.retry.interval=<time in ms>*
3.To start index recovery by custom recovery time as user provided
*atlas.graph.index.search.solr.recovery.start.time=1630086622*
was:
*Index failure resilience:* dynamic handling of failure in updating index (i.e.
HBase commit succeeds but index commit fails
* monitor thread to check state of index
* save index state in graph node
* basic-search to use graph-queries instead of index-queries
* partial reindex of vertices i.e. vertices that were updated since last
successful index update
> Dynamic handling of failure in updating index
> ---------------------------------------------
>
> Key: ATLAS-4408
> URL: https://issues.apache.org/jira/browse/ATLAS-4408
> Project: Atlas
> Issue Type: New Feature
> Components: atlas-core
> Reporter: Radhika Kundam
> Assignee: Radhika Kundam
> Priority: Major
>
> *Index failure resilience:* dynamic handling of failure in updating index
> (i.e. HBase commit succeeds but index commit fails).
> Design details as below.
> # Start new service - IndexRecoveryService at Atlas startup.
> ## Continuously monitor for Solr(Index Client) health for every retryTime
> millisecs
> ### If Solr is healthy and recovery start time is available,
> #### Start Transaction Recovery with available recovery start time(which is
> noted when Solr became unhealthy)
> #### Persist current recovery time as previous which can be used later by
> passing as custom recovery time to start index recovery if required.
> #### Reset current recovery start time
> #### Continue with Solr health checkup.
> ### If Solr is unhealthy and no recovery start time is available,
> #### Shutdown the existing transaction recovery process.
> #### Note down the time which should be the next recovery start time and
> persist in graph.
> #### Continue with Solr health checkup.
> Configuration properties to be used for this feature.
> 1.To enable or disable index recovery(By default index recovery will be
> enabled on Atlas startup)
> *atlas.graph.enable.index.recovery=true*
> 2.To configure how frequently SOLR health check should be done
> *atlas.graph.index.search.solr.status.retry.interval=<time in ms>*
> 3.To start index recovery by custom recovery time as user provided
> *atlas.graph.index.search.solr.recovery.start.time=1630086622*
--
This message was sent by Atlassian Jira
(v8.3.4#803005)