Chris M. Hostetter created SOLR-15293:
-----------------------------------------
Summary: Deprecate/remove overwriteDupes option from
SignatureUpdateProcessorFactory
Key: SOLR-15293
URL: https://issues.apache.org/jira/browse/SOLR-15293
Project: Solr
Issue Type: Sub-task
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter
The design principle of the {{overwriteDupes}} option of
SignatureUpdateProcessorFactory is something that is only viable in single
shard use cases, and even then it currently doesn't work because UpdateCommand
"options" are not included when Shard Leaders write updates to the tlog, or
forwards them to other replicas (SOLR-8030). With multiple shards it can never
be viable w/o broadcasting a "Delete By Query" to every replica on every
document add/update (SOLR-3473) which is vastly less efficient then the current
low level {{updateDocument(Term,...)}} support provided by IndexWriter for
replacing documents by uniqueKey.
I think in general we should remove the {{overwriteDupes}} option completely.
If SignatureUpdateProcessorFactory is used to generate a synthetic uniqueKey
field then the existing Solr/Lucene behavior of routing the document to the
correct shard, and replacing any prior instances of that doc will work find.
The functionality of SignatureUpdateProcessorFactory should be constrained
*solely* to generating a signature – if that signature is put in the unique key
field, then de-duplication will happen automatically.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]