[jira] [Assigned] (ATLAS-4408) Dynamic handling of failure in updating index

Radhika Kundam (Jira) Mon, 11 Oct 2021 22:25:05 -0700


     [ 
https://issues.apache.org/jira/browse/ATLAS-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Radhika Kundam reassigned ATLAS-4408:
-------------------------------------

    Assignee: Ashutosh Mestry  (was: Radhika Kundam)

> Dynamic handling of failure in updating index
> ---------------------------------------------
>
>                 Key: ATLAS-4408
>                 URL: https://issues.apache.org/jira/browse/ATLAS-4408
>             Project: Atlas
>          Issue Type: New Feature
>          Components:  atlas-core
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Radhika Kundam
>            Assignee: Ashutosh Mestry
>            Priority: Major
>              Labels: indexing
>             Fix For: 3.0.0, 2.3.0
>
>         Attachments: IndexRecovery.png, IndexRecovery_FunctionalFlow.png
>
>
> *Index failure resilience:* dynamic handling of failure in updating index 
> (i.e. HBase commit succeeds but index commit fails).
> In case of secondary persistence failure scenario, there will be 
> inconsistency with indexes for all the transactions failed at Solr. And to 
> repair that, the existing option is re-indexing all the data which is time 
> consuming as it involves indexing the entire database.
> To recover such inconsistencies we can use the *transaction write-ahead log 
> option*. By enabling write-ahead log(tx.log-tx), JanusGraph maintains all the 
> transaction log data which can be used to recover indices in case of 
> failures. With this approach, it’s extra overhead to maintain the log data 
> for all transactions but with this approach we can guarantee the system is 
> more resilient and proactive. So advantages of this approach can nullify the 
> overhead of maintaining log data.
> Design details as below.
>  # Start new service - IndexRecoveryService at Atlas startup.
>  ## Continuously monitor for Solr(Index Client) health for every retryTime 
> millisecs
>  ### If Solr is healthy and recovery start time is available, 
>  #### Start Transaction Recovery with available recovery start time(which is 
> noted when Solr became unhealthy)
>  #### Persist current recovery time as previous which can be used later by 
> passing as custom recovery time to start index recovery if required.
>  #### Reset current recovery start time
>  #### Continue with Solr health checkup.
>  ### If Solr is unhealthy and no recovery start time is available, 
>  #### Shutdown the existing transaction recovery process.
>  #### Note down the time which should be the next recovery start time and 
> persist in graph.
>  #### Continue with Solr health checkup.
> Configuration properties to be used for this feature.
> 1.To enable or disable index recovery(By default index recovery will be 
> enabled on Atlas startup)
>     *atlas.graph.enable.index.recovery=true*
>  2.To configure how frequently SOLR health check should be done
>     *atlas.graph.index.search.solr.status.retry.interval=<time in ms>*
>  3.To start index recovery by custom recovery time as user provided
>     *atlas.graph.index.search.solr.recovery.start.time=1630086622*
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ATLAS-4408) Dynamic handling of failure in updating index

Reply via email to