[
https://issues.apache.org/jira/browse/SOLR-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289918#comment-14289918
]
Robert de Lorimier commented on SOLR-6973:
------------------------------------------
Hi, sorry this is my first issue, so I did not know the protocol for how to
bring up an issue. To your point, this is the update processor configuration
for our core:
<updateRequestProcessorChain name="dedupe">
<processor
class="solr.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<str name="signatureField">id</str>
<bool name="overwriteDupes">true</bool>
<str
name="fields">CreateDate,DataCenter,Origin,Environment,Host,Level,Path,Message,ExceptionType</str>
<str
name="signatureClass">solr.processor.Lookup3Signature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
As for routing, I am using the built in routing in solr. We have three nodes,
and the core has a replication of one. I am not sure what you mean by having
the document routed to the different shard. In this particular case, the schema
is as so:
<field name="_version_" type="long" indexed="true" required="true"/>
<field name="id" type="string" indexed="true" stored="true"
required="true"/>
<field name="CreateDate" type="date" indexed="true" stored="true"
required="true"/>
<field name="DataCenter" type="string" indexed="true" stored="true"
multiValued="false" required="true"/>
<field name="Origin" type="string" indexed="true" stored="true"
multiValued="false" required="true"/>
<field name="Environment" type="string" indexed="true" stored="true"
multiValued="false" required="true"/>
<field name="Host" type="string" indexed="true" stored="true"
multiValued="false" required="true"/>
<field name="Level" type="string" indexed="true" stored="true"
multiValued="false" required="true"/>
<field name="Path" type="string" indexed="true" stored="true"
multiValued="false" required="true"/>
<field name="Message" type="string" indexed="true" stored="true"
multiValued="false" required="true"/>
<field name="ExceptionType" type="string" indexed="true" stored="true"
multiValued="false" required="true"/>
<field name="Processed" type="boolean" indexed="true" stored="true"
multiValued="false" required="true"/>
>From looking at the schema, all but three fields, id, version, Processed, are
>included in the processor signatures. If the same values are included in the
>fields for the signature I would have thought that the document would always
>be routed the same way. Please let me know if this is different. In the case
>of checking whether Processed field in the document has been update or not, I
>am simply looking at the document via the solr url:
http://server001:8983/solr/log2_search/select?q=id:6a692418a84a849a
Showing:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params"><str name="q">id:6a692418a84a849a</str></lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="Host">web.1_m2qa11ziio_dev</str>
<str name="DataCenter">heroku</str>
<str name="Origin">m2qa11ziio</str>
<str name="Environment">dev</str>
<date name="CreateDate">2015-01-10T00:07:48.811Z</date>
<str name="Level">I</str>
<str name="ExceptionType"/>
<str name="Message">2015-01-09T00:07:48.811737+00:00 host heroku web.1 -
Process running mem=525M(102.6%)</str>
<str
name="Path">http://rasmdev.ziio.net/app/m2qa11ziio/env/Dev/dc/Heroku</str>
<bool name="Processed">false</bool>
<str name="id">6a692418a84a849a</str>
<long name="_version_">1490119171532390400</long>
</doc>
</result>
</response>
When I do an update via curl:
curl http://server001:8983/solr/log2_search/update/json -H
'Content-type:application/json' -d '[
{
"Host":"web.1_m2qa11ziio_dev",
"DataCenter":"heroku",
"Origin":"m2qa11ziio",
"Environment":"dev",
"CreateDate":"2015-01-10T00:07:48.811Z",
"Level":"I",
"ExceptionType":"",
"Message":"2015-01-09T00:07:48.811737+00:00 host heroku web.1 - Process
running mem=525M(102.6%)",
"Path":"http://rillodev.ziio.net/app/m2qa11ziio/env/Dev/dc/Heroku",
"Processed":"true"
}
]'
Nothing changes. Hopefully that is enough detail to go on.
As I noted earlier, only about 4.7% of the documents have the issue, and the
document updateability does not change. If a document cannot update, it
continues to not update.
> Some documents will not update on a cloud server using
> SignatureUpdateProcessorFactory
> --------------------------------------------------------------------------------------
>
> Key: SOLR-6973
> URL: https://issues.apache.org/jira/browse/SOLR-6973
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 4.7
> Environment: On a redhat 6 servers, using three solr cloud nodes
> Reporter: Robert de Lorimier
>
> We are using solr cloud to hold recent log data for our internal auditing and
> research. When first indexing the data, we flag the record with
> Processed=false, and use this to search solr for new records to put into our
> archive repository. Once the record is committed to the archive repository,
> we update the record by setting the flag to true. As part of eliminating
> duplicate log records we use the SignatureUpdateProcessorFactory with
> overwriteDupes set to true to deduplication any logs that have been sent more
> than once. This works great for 95% of the data. We are able add the records
> to solr, lookup any records that have not been added to the archive, add
> them, and then set the flag to true. However, for 5% of the records we are
> not able to update the flag in the cloud configuration. When sending the
> records that do not update using curl as a test, I do not see any error
> associated with the non-update.
> I also set up the same cores locally without a cloud configuration and the
> same record data does update without issue, so this seems to be a bug related
> to cloud.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]