On 7/17/2020 1:32 AM, yo tomi wrote:
When I did AtomicUpdate on SolrCloud by the following setting, it does
not work properly.

As Jörn Franke already mentioned, you haven't said exactly what "does not work properly" actually means in your situation. Without that information, it will be very difficult to provide any real help.

Atomic update functionality is currently implemented in DistributedUpdateProcessorFactory.

---
<updateRequestProcessorChain name="skip-empty">
  <processor class="solr.DistributedUpdateProcessorFactory"/>
  <processor class="TrimFieldUpdateProcessorFactory" />
  <processor class="RemoveBlankFieldUpdateProcessorFactory" />
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
---
When changed as follows and made it work, it became as expected.
---
<updateRequestProcessorChain name="skip-empty">
  <processor class="TrimFieldUpdateProcessorFactory" />
  <processor class="RemoveBlankFieldUpdateProcessorFactory" />
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
---

The effective result difference between these configurations is that atomic updates will happen first with the first config, and in the second, atomic updates will happen second to last -- just before RunUpdateProcessorFactory.

Also, with the first config, most of the update processors are going to be executed on the machine with the shard leader (after the update is distributed) and if there is more than one NRT replica, they will be executed multiple times. With the second config, most of the processors will be executed on the machine that actually receives the update request. For the purposes of that discussion, remember that when a PULL replica is elected leader, it is effectively an NRT replica.

Does that information help you determine why it doesn't do what you expect?

The later setting and the way of using post-processor could make the
same result, I though,
but using post-processor, bug of SOLR-8030 makes me not feel like using it.
By the latter setting even, is there any possibility of SOLR-8030 to
become?

See this part of the reference guide for a bunch of gory details about DistributedUpdateProcessorFactory:

https://cwiki.apache.org/confluence/display/SOLR/UpdateRequestProcessor#UpdateRequestProcessor-DistributedUpdates

In SOLR-8030, the general consensus among committers is that you should configure almost all update processors as "pre" processors -- placed before DistributedUpdatePorcessorFactory in the config. When done this way, updates are usually faster and less likely to yield inconsistent results.

There may be situations where having them as "post" processors is correct, but that won't happen very often. The second config above does implicitly use "pre" for most of the processors.

Thanks,
Shawn

Reply via email to