[jira] [Commented] (SOLR-12816) Don't allow RunUpdateProcessorFactory to be set before DistributedUpdateProcessorFactory

Alexandre Rafalovitch (JIRA) Mon, 08 Oct 2018 16:30:24 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642597#comment-16642597
 ]


Alexandre Rafalovitch commented on SOLR-12816:
----------------------------------------------

I did find the passage in the documentation about [URPs that may want to run 
after 
DistributedUpdateProcessor|https://lucene.apache.org/solr/guide/7_5/update-request-processors.html#atomic-update-processor-factory].
 Basically, if they need to operate on the full document even if only atomic 
update was sent, they need to be in the chain after the the 
DistributedUpdateProcessor (which reconstructs the document).

I think this must be what I was trying to remember.

> Don't allow RunUpdateProcessorFactory to be set before 
> DistributedUpdateProcessorFactory
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-12816
>                 URL: https://issues.apache.org/jira/browse/SOLR-12816
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>            Priority: Major
>
> Here's the problem that came up with a customer call today morning - "My 
> documents are not getting replicated to the replicas and the doc counts don't 
> match up"
> It was a 3 node cluster. The collection was 1 shard X 3 replicas .
> This is a scary situation to be in. We started down the patch of debugging 
> replica types , auto-commits , checking if the {{_version_}}  field and 
> {{id}} fields were defined correctly etc.
>  
> The problem was the user had defined a custom update processor chain and had 
> RunUpdateProcessorFactory defined before DistributedUpdateProcessorFactory
> {code:java}
> <updateRequestProcessorChain ...
>   ....
>   <processor class="solr.LogUpdateProcessorFactory"/>
>   <processor class="solr.RunUpdateProcessorFactory"/>
>   <processor class="solr.DistributedUpdateProcessorFactory"/>
> </updateRequestProcessorChain>{code}
>  
> With this update chain, whichever node you index the document against will be 
> the only one indexing the document. It will never forward to the other nodes. 
> So you can  index against a node hosting a replica and the leader will never 
> get this document.
> Is there any use-case where having RunUpdateProcessor before 
> DistributedUpdateProcessor is needed?
>  
> Perhaps we could borrow the idea from TRA or make these two update processors 
> default and remove them from the default configs?
> {code:java}
> When processing an update for a TRA, Solr initializes its 
> UpdateRequestProcessor chain as usual, but when DistributedUpdateProcessor 
> (DUP) initializes, it detects that the update targets a TRA and injects 
> TimeRoutedUpdateProcessor (TRUP) in front of itself.{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12816) Don't allow RunUpdateProcessorFactory to be set before DistributedUpdateProcessorFactory

Reply via email to