[
https://issues.apache.org/jira/browse/SOLR-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280264#comment-13280264
]
Markus Jelsma commented on SOLR-3473:
-------------------------------------
That makes sense indeed.
To work around the problem of having the digest field as ID, could it not
simply issue a deleteByQuery for the digest prior to adding it? Would that
cause significant overhead for very large systems with many updates?
We would, from Nutch' point of view, certainly want to avoid changing the ID
from URL to digest.
> Distributed deduplication broken
> --------------------------------
>
> Key: SOLR-3473
> URL: https://issues.apache.org/jira/browse/SOLR-3473
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud, update
> Affects Versions: 4.0
> Reporter: Markus Jelsma
> Fix For: 4.0
>
>
> Solr's deduplication via the SignatureUpdateProcessor is broken for
> distributed updates on SolrCloud.
> Mark Miller:
> {quote}
> Looking again at the SignatureUpdateProcessor code, I think that indeed this
> won't currently work with distrib updates. Could you file a JIRA issue for
> that? The problem is that we convert update commands into solr documents -
> and that can cause a loss of info if an update proc modifies the update
> command.
> I think the reason that you see a multiple values error when you try the
> other order is because of the lack of a document clone (the other issue I
> mentioned a few emails back). Addressing that won't solve your issue though -
> we have to come up with a way to propagate the currently lost info on the
> update command.
> {quote}
> Please see the ML thread for the full discussion:
> http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]