Thank you Ashutosh for the clear explanation!

On Wed, Jun 23, 2021 at 1:05 AM Ashutosh Mestry
<[email protected]> wrote:

> Hi Adam
>
> About the issues where indexes go out of sync with data: Yes, we see this
> issue often enough in customer environments and even in our environments.
> The root cause is that JanusGraph does not rollback a transaction even if
> index commit fails.
>
> When this happens, users see that their data is retrieved using Advanced
> Search but does not show up when searched via Basic Search.
>
> Few months back we added ability to repair index as part of our Java Patch
> framework. This approach has a much higher throughput compared with using
> the index-repair utility.
>
> JIRA: https://issues.apache.org/jira/browse/ATLAS-4015
> Commit ids:
>
>   *   c810e47a4
>   *   1e06e372e
> Set these in atlas-application.properties
> atlas.patch.numWorkers=<number of cores - 1> * 2
> atlas.patch.batchSize=1000
> atlas.rebuild.index=true
>
> We are brainstorming on a more proactive solution to this problem. So far,
> we don’t have a design.
>
> Best regards,
>
> ~ ashutosh
> Ashutosh Mestry<mailto:[email protected]> . Cloudera, Inc.
>
> From: Adam Bellemare <[email protected]>
> Date: Tuesday, June 22, 2021 at 7:43 AM
> To: [email protected] <[email protected]>
> Cc: Olessia D'Souza <[email protected]>, Karl Taylor <
> [email protected]>, Nargiza Sarkulova <[email protected]
> >
> Subject: Entities missing from ES/Solr index - why?
> Hi Folks
>
> After bulk loading large amounts of entities (Atlas 2.1.0, Solr), we often
> have a number of them missing from the Solr Index. We are still able to
> find these entities when we do Advanced Search on HBase directly, but they
> are not in the Solr index.
>
> From https://atlas.apache.org/2.0.0/AtlasRepairIndex.html
>
> "In rare, cases it is possible that during entity creation, the entity is
> stored in the data store, but the *corresponding indexes are not created in
> Solr*. Since Atlas relies heavily on Solr in the operation of its Basic
> Search, this will result in entity not being returned by a search."
>
> Why does this occur? Is this a race condition somewhere?
> Is there an open issue for this? I cannot find one in Atlas JIRA.
>
> How "rare" is this? We are seeing this frequently in our local dev
> environment, (as in, missing 100s of records when bulk uploading 100k+
> entities), but we have also identified that resource constraints appear to
> be a factor (eg: CPU starvation), as well as a series of poor default
> configurations. Is this the same root cause of missing Solr indices?
>
> Thanks
> Adam
>

Reply via email to