[ 
https://issues.apache.org/jira/browse/ATLAS-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radhika Kundam updated ATLAS-4408:
----------------------------------
    Description: 
*Index failure resilience:* dynamic handling of failure in updating index (i.e. 
HBase commit succeeds but index commit fails).

Design details as below.
 # Start new service - IndexRecoveryService at Atlas startup.
 ## Continuously monitor for Solr(Index Client) health for every retryTime 
millisecs
 ### If Solr is healthy and recovery start time is available, 
 #### Start Transaction Recovery with available recovery start time(which is 
noted when Solr became unhealthy)
 #### Persist current recovery time as previous which can be used later by 
passing as custom recovery time to start index recovery if required.
 #### Reset current recovery start time
 #### Continue with Solr health checkup.
 ### If Solr is unhealthy and no recovery start time is available, 
 #### Shutdown the existing transaction recovery process.
 #### Note down the time which should be the next recovery start time and 
persist in graph.
 #### Continue with Solr health checkup.

Configuration properties to be used for this feature.

1.To enable or disable index recovery(By default index recovery will be enabled 
on Atlas startup)
   *atlas.graph.enable.index.recovery=true*
2.To configure how frequently SOLR health check should be done
   *atlas.graph.index.search.solr.status.retry.interval=<time in ms>*
3.To start index recovery by custom recovery time as user provided
   *atlas.graph.index.search.solr.recovery.start.time=1630086622*

  was:
*Index failure resilience:* dynamic handling of failure in updating index (i.e. 
HBase commit succeeds but index commit fails
 * monitor thread to check state of index

 * save index state in graph node

 * basic-search to use graph-queries instead of index-queries

 * partial reindex of vertices i.e. vertices that were updated since last 
successful index update


> Dynamic handling of failure in updating index
> ---------------------------------------------
>
>                 Key: ATLAS-4408
>                 URL: https://issues.apache.org/jira/browse/ATLAS-4408
>             Project: Atlas
>          Issue Type: New Feature
>          Components:  atlas-core
>            Reporter: Radhika Kundam
>            Assignee: Radhika Kundam
>            Priority: Major
>
> *Index failure resilience:* dynamic handling of failure in updating index 
> (i.e. HBase commit succeeds but index commit fails).
> Design details as below.
>  # Start new service - IndexRecoveryService at Atlas startup.
>  ## Continuously monitor for Solr(Index Client) health for every retryTime 
> millisecs
>  ### If Solr is healthy and recovery start time is available, 
>  #### Start Transaction Recovery with available recovery start time(which is 
> noted when Solr became unhealthy)
>  #### Persist current recovery time as previous which can be used later by 
> passing as custom recovery time to start index recovery if required.
>  #### Reset current recovery start time
>  #### Continue with Solr health checkup.
>  ### If Solr is unhealthy and no recovery start time is available, 
>  #### Shutdown the existing transaction recovery process.
>  #### Note down the time which should be the next recovery start time and 
> persist in graph.
>  #### Continue with Solr health checkup.
> Configuration properties to be used for this feature.
> 1.To enable or disable index recovery(By default index recovery will be 
> enabled on Atlas startup)
>    *atlas.graph.enable.index.recovery=true*
> 2.To configure how frequently SOLR health check should be done
>    *atlas.graph.index.search.solr.status.retry.interval=<time in ms>*
> 3.To start index recovery by custom recovery time as user provided
>    *atlas.graph.index.search.solr.recovery.start.time=1630086622*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to