[jira] [Updated] (ATLAS-4408) Dynamic handling of failure in updating index

Radhika Kundam (Jira) Mon, 30 Aug 2021 14:29:08 -0700


     [ 
https://issues.apache.org/jira/browse/ATLAS-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Radhika Kundam updated ATLAS-4408:
----------------------------------
    Description: 
*Index failure resilience:* dynamic handling of failure in updating index (i.e. 
HBase commit succeeds but index commit fails).

To support this feature, need to enable *tx.log-tx* property which will start 
storing write-ahead logs.*With this approach we need to maintain more data 
related to write-ahead transaction logs*. But by comparing the advantages of 
index recovery proactively over reindexing entire data incase of secondary 
persistent failures, it's worth  to have this feature though overhead of 
maintaining more data.

Design details as below.
 # Start new service - IndexRecoveryService at Atlas startup.
 ## Continuously monitor for Solr(Index Client) health for every retryTime 
millisecs
 ### If Solr is healthy and recovery start time is available, 
 #### Start Transaction Recovery with available recovery start time(which is 
noted when Solr became unhealthy)
 #### Persist current recovery time as previous which can be used later by 
passing as custom recovery time to start index recovery if required.
 #### Reset current recovery start time
 #### Continue with Solr health checkup.
 ### If Solr is unhealthy and no recovery start time is available, 
 #### Shutdown the existing transaction recovery process.
 #### Note down the time which should be the next recovery start time and 
persist in graph.
 #### Continue with Solr health checkup.

Configuration properties to be used for this feature.

1.To enable or disable index recovery(By default index recovery will be enabled 
on Atlas startup)
    *atlas.graph.enable.index.recovery=true*
 2.To configure how frequently SOLR health check should be done
    *atlas.graph.index.search.solr.status.retry.interval=<time in ms>*
 3.To start index recovery by custom recovery time as user provided
    *atlas.graph.index.search.solr.recovery.start.time=1630086622*

 

  was:
*Index failure resilience:* dynamic handling of failure in updating index (i.e. 
HBase commit succeeds but index commit fails).

Design details as below.
 # Start new service - IndexRecoveryService at Atlas startup.
 ## Continuously monitor for Solr(Index Client) health for every retryTime 
millisecs
 ### If Solr is healthy and recovery start time is available, 
 #### Start Transaction Recovery with available recovery start time(which is 
noted when Solr became unhealthy)
 #### Persist current recovery time as previous which can be used later by 
passing as custom recovery time to start index recovery if required.
 #### Reset current recovery start time
 #### Continue with Solr health checkup.
 ### If Solr is unhealthy and no recovery start time is available, 
 #### Shutdown the existing transaction recovery process.
 #### Note down the time which should be the next recovery start time and 
persist in graph.
 #### Continue with Solr health checkup.

Configuration properties to be used for this feature.

1.To enable or disable index recovery(By default index recovery will be enabled 
on Atlas startup)
   *atlas.graph.enable.index.recovery=true*
2.To configure how frequently SOLR health check should be done
   *atlas.graph.index.search.solr.status.retry.interval=<time in ms>*
3.To start index recovery by custom recovery time as user provided
   *atlas.graph.index.search.solr.recovery.start.time=1630086622*


> Dynamic handling of failure in updating index
> ---------------------------------------------
>
>                 Key: ATLAS-4408
>                 URL: https://issues.apache.org/jira/browse/ATLAS-4408
>             Project: Atlas
>          Issue Type: New Feature
>          Components:  atlas-core
>            Reporter: Radhika Kundam
>            Assignee: Radhika Kundam
>            Priority: Major
>
> *Index failure resilience:* dynamic handling of failure in updating index 
> (i.e. HBase commit succeeds but index commit fails).
> To support this feature, need to enable *tx.log-tx* property which will start 
> storing write-ahead logs.*With this approach we need to maintain more data 
> related to write-ahead transaction logs*. But by comparing the advantages of 
> index recovery proactively over reindexing entire data incase of secondary 
> persistent failures, it's worth  to have this feature though overhead of 
> maintaining more data.
> Design details as below.
>  # Start new service - IndexRecoveryService at Atlas startup.
>  ## Continuously monitor for Solr(Index Client) health for every retryTime 
> millisecs
>  ### If Solr is healthy and recovery start time is available, 
>  #### Start Transaction Recovery with available recovery start time(which is 
> noted when Solr became unhealthy)
>  #### Persist current recovery time as previous which can be used later by 
> passing as custom recovery time to start index recovery if required.
>  #### Reset current recovery start time
>  #### Continue with Solr health checkup.
>  ### If Solr is unhealthy and no recovery start time is available, 
>  #### Shutdown the existing transaction recovery process.
>  #### Note down the time which should be the next recovery start time and 
> persist in graph.
>  #### Continue with Solr health checkup.
> Configuration properties to be used for this feature.
> 1.To enable or disable index recovery(By default index recovery will be 
> enabled on Atlas startup)
>     *atlas.graph.enable.index.recovery=true*
>  2.To configure how frequently SOLR health check should be done
>     *atlas.graph.index.search.solr.status.retry.interval=<time in ms>*
>  3.To start index recovery by custom recovery time as user provided
>     *atlas.graph.index.search.solr.recovery.start.time=1630086622*
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ATLAS-4408) Dynamic handling of failure in updating index

Reply via email to