[ 
https://issues.apache.org/jira/browse/SOLR-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289918#comment-14289918
 ] 

Robert de Lorimier commented on SOLR-6973:
------------------------------------------

Hi, sorry this is my first issue, so I did not know the protocol for how to 
bring up an issue. To your point, this is the update processor configuration 
for our core:

        <updateRequestProcessorChain name="dedupe">
                <processor 
class="solr.processor.SignatureUpdateProcessorFactory">
                        <bool name="enabled">true</bool>
                        <str name="signatureField">id</str>
                        <bool name="overwriteDupes">true</bool>
                        <str 
name="fields">CreateDate,DataCenter,Origin,Environment,Host,Level,Path,Message,ExceptionType</str>
                        <str 
name="signatureClass">solr.processor.Lookup3Signature</str>
                </processor>
                <processor class="solr.LogUpdateProcessorFactory" />
                <processor class="solr.RunUpdateProcessorFactory" />
        </updateRequestProcessorChain>

As for routing, I am using the built in routing in solr. We have three nodes, 
and the core has a replication of one. I am not sure what you mean by having 
the document routed to the different shard. In this particular case, the schema 
is as so:

        <field name="_version_" type="long" indexed="true" required="true"/>
        <field name="id" type="string" indexed="true" stored="true" 
required="true"/>
        <field name="CreateDate" type="date" indexed="true" stored="true" 
required="true"/>
        <field name="DataCenter" type="string" indexed="true" stored="true" 
multiValued="false" required="true"/>
        <field name="Origin" type="string" indexed="true" stored="true" 
multiValued="false" required="true"/>
        <field name="Environment" type="string" indexed="true" stored="true" 
multiValued="false" required="true"/>
        <field name="Host" type="string" indexed="true" stored="true" 
multiValued="false" required="true"/>
        <field name="Level" type="string" indexed="true" stored="true" 
multiValued="false" required="true"/>
        <field name="Path" type="string" indexed="true" stored="true" 
multiValued="false" required="true"/>
        <field name="Message" type="string" indexed="true" stored="true" 
multiValued="false" required="true"/>
        <field name="ExceptionType" type="string" indexed="true" stored="true" 
multiValued="false" required="true"/>
        <field name="Processed" type="boolean" indexed="true" stored="true" 
multiValued="false" required="true"/>

>From looking at the schema, all but three fields, id, version, Processed, are 
>included in the processor signatures. If the same values are included in the 
>fields for the signature I would have thought that the document would always 
>be routed the same way. Please let me know if this is different. In the case 
>of checking whether Processed field in the document has been update or not, I 
>am simply looking at the document via the solr url:

http://server001:8983/solr/log2_search/select?q=id:6a692418a84a849a

Showing:

<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">1</int>
  <lst name="params"><str name="q">id:6a692418a84a849a</str></lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
  <str name="Host">web.1_m2qa11ziio_dev</str>
  <str name="DataCenter">heroku</str>
  <str name="Origin">m2qa11ziio</str>
  <str name="Environment">dev</str>
  <date name="CreateDate">2015-01-10T00:07:48.811Z</date>
  <str name="Level">I</str>
  <str name="ExceptionType"/>
  <str name="Message">2015-01-09T00:07:48.811737+00:00 host heroku web.1 - 
Process running mem=525M(102.6%)</str>
  <str 
name="Path">http://rasmdev.ziio.net/app/m2qa11ziio/env/Dev/dc/Heroku</str>
  <bool name="Processed">false</bool>
  <str name="id">6a692418a84a849a</str>
  <long name="_version_">1490119171532390400</long>
</doc>
</result>
</response>

When I do an update via curl:

curl http://server001:8983/solr/log2_search/update/json -H 
'Content-type:application/json' -d '[
        {
        "Host":"web.1_m2qa11ziio_dev",
        "DataCenter":"heroku",
        "Origin":"m2qa11ziio",
        "Environment":"dev",
        "CreateDate":"2015-01-10T00:07:48.811Z",
        "Level":"I",
        "ExceptionType":"",
        "Message":"2015-01-09T00:07:48.811737+00:00 host heroku web.1 - Process 
running mem=525M(102.6%)",
        "Path":"http://rillodev.ziio.net/app/m2qa11ziio/env/Dev/dc/Heroku";,
        "Processed":"true"
        }
]'

Nothing changes. Hopefully that is enough detail to go on. 

As I noted earlier, only about 4.7% of the documents have the issue, and the 
document updateability does not change. If a document cannot update, it 
continues to not update.

> Some documents will not update on a cloud server using 
> SignatureUpdateProcessorFactory
> --------------------------------------------------------------------------------------
>
>                 Key: SOLR-6973
>                 URL: https://issues.apache.org/jira/browse/SOLR-6973
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.7
>         Environment: On a redhat 6 servers, using three solr cloud nodes
>            Reporter: Robert de Lorimier
>
> We are using solr cloud to hold recent log data for our internal auditing and 
> research. When first indexing the data, we flag the record with 
> Processed=false, and use this to search solr for new records to put into our 
> archive repository. Once the record is committed to the archive repository, 
> we update the record by setting the flag to true. As part of eliminating 
> duplicate log records we use the SignatureUpdateProcessorFactory with 
> overwriteDupes set to true to deduplication any logs that have been sent more 
> than once. This works great for 95% of the data. We are able add the records 
> to solr, lookup any records that have not been added to the archive, add 
> them, and then set the flag to true. However, for 5% of the records we are 
> not able to update the flag in the cloud configuration. When sending the 
> records that do not update using curl as a test, I do not see any error 
> associated with the non-update.
> I also set up the same cores locally without a cloud configuration and the 
> same record data does update without issue, so this seems to be a bug related 
> to cloud. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to