Nick Hadder created SOLR-16160:
----------------------------------
Summary: UpdateXmlMessages duplicate data when data is removed and
then added in the same message
Key: SOLR-16160
URL: https://issues.apache.org/jira/browse/SOLR-16160
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: search, update
Affects Versions: 8.11.1
Reporter: Nick Hadder
Attachments: image-2022-04-20-10-34-08-573.png,
image-2022-04-20-10-35-05-247.png
*Replication Steps*
1. Have two multi-value fields with the following schema
{code:java}
<field name="docTags" type="plongs" multiValued="true" indexed="true"
stored="true"/><field name="tg0001" type="ipro_strings" multiValued="true"
indexed="true" stored="true"/>
<fieldType name="plong" class="solr.LongPointField" docValues="true"/>
<fieldType name="ipro_strings" class="solr.TextField" sortMissingLast="true"
multiValued="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
{code}
2. Execute the following UpdateXmlMessage
{code:java}
<add commitWithin="1000">
<doc>
<field name="_id">1</field>
<field name="docTags" update="remove"><![CDATA[1]]></field>
<field name="tg0001" update="remove"><![CDATA[Convert to Image]]></field>
<field name="docTags" update="remove"><![CDATA[4]]></field>
<field name="tg0001" update="remove"><![CDATA[Large Files]]></field>
<field name="docTags" update="remove"><![CDATA[6]]></field>
<field name="tg0001" update="remove"><![CDATA[To Bulk-Print]]></field>
</doc>
</add>
<add commitWithin="1000">
<doc>
<field name="_id">1</field>
<field name="docTags" update="remove"><![CDATA[6]]></field>
<field name="tg0001" update="remove"><![CDATA[To Bulk-Print]]></field>
<field name="docTags" update="add-distinct"><![CDATA[1]]></field>
<field name="tg0001" update="add-distinct"><![CDATA[Convert to Image]]></field>
<field name="docTags" update="add-distinct"><![CDATA[4]]></field>
<field name="tg0001" update="add-distinct"><![CDATA[Large Files]]></field>
</doc>
</add>
<add commitWithin="1000">
<doc>
<field name="_id">1</field>
<field name="docTags" update="remove"><![CDATA[1]]></field>
<field name="tg0001" update="remove"><![CDATA[Convert to Image]]></field>
<field name="docTags" update="remove"><![CDATA[4]]></field>
<field name="tg0001" update="remove"><![CDATA[Large Files]]></field>
<field name="docTags" update="add-distinct"><![CDATA[6]]></field>
<field name="tg0001" update="add-distinct"><![CDATA[To Bulk-Print]]></field>
</doc>
</add> {code}
3. Observe the following defect of duplicate values in those fields for that
document
!image-2022-04-20-10-35-05-247.png!
*Note:* If you add the data first in the Xml message and the update="remove"
tags at the bottom, it works as expected and only adds once instance of the
data from the above update="add-distinct" message. The issue only occurs if the
remove tags come before the add-distinct tags.
Is this because of some undocumented order the updates need to be in or is it a
true defect that it is not working as expected?
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]