[ https://issues.apache.org/jira/browse/SOLR-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552950#comment-13552950 ]
Yonik Seeley commented on SOLR-4016: ------------------------------------ bq. I see why you suggested that. The signature is like a unique key and modifying it seems like a rare use-case. But, if we do go that way, we should throw an exception and explicitly disallow partial update of signature generating fields. There are different use-cases here. If the signature being generated was the unique key, then atomic updates should be able to proceed fine as long as the id field is specified (as should always be the case with atomic updates). > Deduplication is broken by partial update > ----------------------------------------- > > Key: SOLR-4016 > URL: https://issues.apache.org/jira/browse/SOLR-4016 > Project: Solr > Issue Type: Bug > Components: update > Affects Versions: 4.0 > Environment: Tomcat6 / Catalina on Ubuntu 12.04 LTS > Reporter: Joel Nothman > Assignee: Shalin Shekhar Mangar > Labels: 4.0.1_Candidate > Fix For: 4.1, 5.0 > > Attachments: SOLR-4016-disallow-partial-update.patch, > SOLR-4016-disallow-partial-update.patch, SOLR-4016.patch > > > The SignatureUpdateProcessorFactory used (primarily?) for deduplication does > not consider partial update semantics. > The below uses the following solrconfig.xml excerpt: > {noformat} > <updateRequestProcessorChain name="text_hash"> > <processor class="solr.processor.SignatureUpdateProcessorFactory"> > <bool name="enabled">true</bool> > <str name="signatureField">text_hash</str> > <bool name="overwriteDupes">false</bool> > <str name="fields">text</str> > <str name="signatureClass">solr.processor.TextProfileSignature</str> > </processor> > <processor class="solr.LogUpdateProcessorFactory" /> > <processor class="solr.RunUpdateProcessorFactory" /> > </updateRequestProcessorChain> > {noformat} > Firstly, the processor treats {noformat}{"set": "value"}{noformat} as a > string and hashes it, instead of the value alone: > {noformat} > $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d > '{"add":{"doc":{"id": "abcde", "text": {"set": "hello world"}}}}' && curl > '$URL/select?q=id:abcde' > {"responseHeader":{"status":0,"QTime":30}} > <?xml version="1.0" encoding="UTF-8"?><response><lst > name="responseHeader"><int name="status">0</int><int name="QTime">1</int><lst > name="params"><str name="q">id:abcde</str></lst></lst><result name="response" > numFound="1" start="0"><doc><str name="id">abcde</str><str name="text">hello > world</str><str name="text_hash">ad48c7ad60ac22cc</str><long > name="_version_">1417247434224959488</long></doc></result> > </response> > $ > $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d > '{"add":{"doc":{"id": "abcde", "text": "hello world"}}}' && curl > '$URL/select?q=id:abcde' > {"responseHeader":{"status":0,"QTime":27}} > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="responseHeader"><int name="status">0</int><int > name="QTime">1</int><lst name="params"><str > name="q">id:abcde</str></lst></lst><result name="response" numFound="1" > start="0"><doc><str name="id">abcde</str><str name="text">hello > world</str><str name="text_hash">b169c743d220da8d</str><long > name="_version_">1417248022215000064</long></doc></result> > </response> > {noformat} > Note the different text_hash value. > Secondly, when updating a field other than those used to create the signature > (which I imagine is a more common use-case), the signature is recalculated > from no values: > {noformat} > $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d > '{"add":{"doc":{"id": "abcde", "title": {"set": "new title"}}}}' && curl > '$URL/select?q=id:abcde' > {"responseHeader":{"status":0,"QTime":39}} > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="responseHeader"><int name="status">0</int><int > name="QTime">1</int><lst name="params"><str > name="q">id:abcde</str></lst></lst><result name="response" numFound="1" > start="0"><doc><str name="id">abcde</str><str name="text">hello > world</str><str name="text_hash">0000000000000000</str><str name="title">new > title</str><long name="_version_">1417248120480202752</long></doc></result> > </response> > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org