[ 
https://issues.apache.org/jira/browse/SOLR-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552967#comment-13552967
 ] 

Shalin Shekhar Mangar commented on SOLR-4016:
---------------------------------------------

bq. If the signature being generated was the unique key, then atomic updates 
should be able to proceed fine as long as the id field is specified (as should 
always be the case with atomic updates).

The patch that I committed throws an exception if an atomic update request 
contains fields that are used to compute the signature. An atomic update 
request which does not modify the signature, proceeds as normal. This way we 
make sure that a document never contains a wrong signature.

Do you agree that this is an acceptable compromise until a proper fix is in 
place?
                
> Deduplication is broken by partial update
> -----------------------------------------
>
>                 Key: SOLR-4016
>                 URL: https://issues.apache.org/jira/browse/SOLR-4016
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.0
>         Environment: Tomcat6 / Catalina on Ubuntu 12.04 LTS
>            Reporter: Joel Nothman
>            Assignee: Shalin Shekhar Mangar
>              Labels: 4.0.1_Candidate
>             Fix For: 4.1, 5.0
>
>         Attachments: SOLR-4016-disallow-partial-update.patch, 
> SOLR-4016-disallow-partial-update.patch, SOLR-4016.patch
>
>
> The SignatureUpdateProcessorFactory used (primarily?) for deduplication does 
> not consider partial update semantics.
> The below uses the following solrconfig.xml excerpt:
> {noformat}
>      <updateRequestProcessorChain name="text_hash">
>        <processor class="solr.processor.SignatureUpdateProcessorFactory">
>          <bool name="enabled">true</bool>
>          <str name="signatureField">text_hash</str>
>          <bool name="overwriteDupes">false</bool>
>          <str name="fields">text</str>
>          <str name="signatureClass">solr.processor.TextProfileSignature</str>
>        </processor>
>        <processor class="solr.LogUpdateProcessorFactory" />
>        <processor class="solr.RunUpdateProcessorFactory" />
>      </updateRequestProcessorChain>
> {noformat}
> Firstly, the processor treats {noformat}{"set": "value"}{noformat} as a 
> string and hashes it, instead of the value alone:
> {noformat}
> $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d 
> '{"add":{"doc":{"id": "abcde", "text": {"set": "hello world"}}}}' && curl 
> '$URL/select?q=id:abcde'
> {"responseHeader":{"status":0,"QTime":30}}
> <?xml version="1.0" encoding="UTF-8"?><response><lst 
> name="responseHeader"><int name="status">0</int><int name="QTime">1</int><lst 
> name="params"><str name="q">id:abcde</str></lst></lst><result name="response" 
> numFound="1" start="0"><doc><str name="id">abcde</str><str name="text">hello 
> world</str><str name="text_hash">ad48c7ad60ac22cc</str><long 
> name="_version_">1417247434224959488</long></doc></result>
> </response>
> $
> $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d 
> '{"add":{"doc":{"id": "abcde", "text": "hello world"}}}' && curl 
> '$URL/select?q=id:abcde'
> {"responseHeader":{"status":0,"QTime":27}}
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int 
> name="QTime">1</int><lst name="params"><str 
> name="q">id:abcde</str></lst></lst><result name="response" numFound="1" 
> start="0"><doc><str name="id">abcde</str><str name="text">hello 
> world</str><str name="text_hash">b169c743d220da8d</str><long 
> name="_version_">1417248022215000064</long></doc></result>
> </response>
> {noformat}
> Note the different text_hash value.
> Secondly, when updating a field other than those used to create the signature 
> (which I imagine is a more common use-case), the signature is recalculated 
> from no values:
> {noformat}
> $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d 
> '{"add":{"doc":{"id": "abcde", "title": {"set": "new title"}}}}' && curl 
> '$URL/select?q=id:abcde'
> {"responseHeader":{"status":0,"QTime":39}}
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int 
> name="QTime">1</int><lst name="params"><str 
> name="q">id:abcde</str></lst></lst><result name="response" numFound="1" 
> start="0"><doc><str name="id">abcde</str><str name="text">hello 
> world</str><str name="text_hash">0000000000000000</str><str name="title">new 
> title</str><long name="_version_">1417248120480202752</long></doc></result>
> </response>
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to