Hi Folks After bulk loading large amounts of entities (Atlas 2.1.0, Solr), we often have a number of them missing from the Solr Index. We are still able to find these entities when we do Advanced Search on HBase directly, but they are not in the Solr index.
>From https://atlas.apache.org/2.0.0/AtlasRepairIndex.html "In rare, cases it is possible that during entity creation, the entity is stored in the data store, but the *corresponding indexes are not created in Solr*. Since Atlas relies heavily on Solr in the operation of its Basic Search, this will result in entity not being returned by a search." Why does this occur? Is this a race condition somewhere? Is there an open issue for this? I cannot find one in Atlas JIRA. How "rare" is this? We are seeing this frequently in our local dev environment, (as in, missing 100s of records when bulk uploading 100k+ entities), but we have also identified that resource constraints appear to be a factor (eg: CPU starvation), as well as a series of poor default configurations. Is this the same root cause of missing Solr indices? Thanks Adam
