Thank you Ashutosh for the clear explanation! On Wed, Jun 23, 2021 at 1:05 AM Ashutosh Mestry <[email protected]> wrote:
> Hi Adam > > About the issues where indexes go out of sync with data: Yes, we see this > issue often enough in customer environments and even in our environments. > The root cause is that JanusGraph does not rollback a transaction even if > index commit fails. > > When this happens, users see that their data is retrieved using Advanced > Search but does not show up when searched via Basic Search. > > Few months back we added ability to repair index as part of our Java Patch > framework. This approach has a much higher throughput compared with using > the index-repair utility. > > JIRA: https://issues.apache.org/jira/browse/ATLAS-4015 > Commit ids: > > * c810e47a4 > * 1e06e372e > Set these in atlas-application.properties > atlas.patch.numWorkers=<number of cores - 1> * 2 > atlas.patch.batchSize=1000 > atlas.rebuild.index=true > > We are brainstorming on a more proactive solution to this problem. So far, > we don’t have a design. > > Best regards, > > ~ ashutosh > Ashutosh Mestry<mailto:[email protected]> . Cloudera, Inc. > > From: Adam Bellemare <[email protected]> > Date: Tuesday, June 22, 2021 at 7:43 AM > To: [email protected] <[email protected]> > Cc: Olessia D'Souza <[email protected]>, Karl Taylor < > [email protected]>, Nargiza Sarkulova <[email protected] > > > Subject: Entities missing from ES/Solr index - why? > Hi Folks > > After bulk loading large amounts of entities (Atlas 2.1.0, Solr), we often > have a number of them missing from the Solr Index. We are still able to > find these entities when we do Advanced Search on HBase directly, but they > are not in the Solr index. > > From https://atlas.apache.org/2.0.0/AtlasRepairIndex.html > > "In rare, cases it is possible that during entity creation, the entity is > stored in the data store, but the *corresponding indexes are not created in > Solr*. Since Atlas relies heavily on Solr in the operation of its Basic > Search, this will result in entity not being returned by a search." > > Why does this occur? Is this a race condition somewhere? > Is there an open issue for this? I cannot find one in Atlas JIRA. > > How "rare" is this? We are seeing this frequently in our local dev > environment, (as in, missing 100s of records when bulk uploading 100k+ > entities), but we have also identified that resource constraints appear to > be a factor (eg: CPU starvation), as well as a series of poor default > configurations. Is this the same root cause of missing Solr indices? > > Thanks > Adam >
