[ 
https://issues.apache.org/jira/browse/SOLR-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4455:
---------------------------

    Attachment: SOLR-4455.patch

Attaching a patch that adds the logic i was thinking of to 
DistributedUpdateProcessor.

At first i was confused why none of the existing distributed query tests 
weren't already failing, since the test config includes a "timestamp" field -- 
and then i realized it's because the "handler" for comparing multiple responses 
for identical queries is configured to "SKIPVAL" the timestamp field in most 
tests.

I updated a lot of the test scafolding to explicitly set a consistent NOW when 
talking to both the controlClient and a distributedClient.

In the attached patch, TestDistributedSearch and BasicDistributedZkTest have 
both been updated to no longer SKIPVAL the timestamp, and they pss, 
demontrating that the basics of this test scaffolding changes and the changes 
to DistributedUpdateProcessor seem to work.

BasicDistributedZk2Test on the otherhand fails very early and consistently with 
these changes and the timestamp SKIPVAL disabled ... with the "nocommit" in 
place to always force a NOW value of in the year 2038, you can see from the 
logs that somehow the cloud copy of doc id=1 is still getting a timestamp of 
the currenttime, even though the control solr instance gets the expected 
value...  i'm not really sure why/how this is happening, because you can see 
the NOW value specified in the logs for all the /update requests related to 
id=1 (even when forwarded from the leader)

----

One thing that should be noted is that while typing up these notes, it occured 
to me that these changes still might not garuntee consistency in the case of a 
recovery situation that results in replaying the transaction log -- in which 
case the _documents_ are recorded, but not all of the update request params 
like NOW.

I'm not certain if this is causing the BasicDistributedZk2Test failures 
mentioned above -- but it's certianly possible (i do see mentions in the logs 
of "Log replay finished. recoveryInfo=RecoveryInfo{adds=1 ...", but it's not 
clear to me why any recovery would be happening ... nothing jumps out at me in 
this test to suggest that anything is aborting nodes to force recovery.


                
> Stored value of "NOW" differs between replicas
> ----------------------------------------------
>
>                 Key: SOLR-4455
>                 URL: https://issues.apache.org/jira/browse/SOLR-4455
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.1
>            Reporter: Colin Bartolome
>            Assignee: Hoss Man
>            Priority: Minor
>         Attachments: SOLR-4455.patch
>
>
> I have a field in {{schema.xml}} defined like this:
> {code:xml}
> <field name="timestamp" type="date" indexed="true" stored="true" 
> default="NOW" />
> {code}
> When I perform a query that's load-balanced across the servers in my cloud, 
> the value stored in that field differs slightly between each replica for the 
> same returned document.
> I haven't seen this field differ by more than a tenth of a second and I'm not 
> running queries against it, but I can picture a situation where somebody has 
> one replica returning 23:59:59.990 and another returning 00:00:00.010 and a 
> query starts behaving oddly.
> It seems like the leader should evaluate what "NOW" means and the replicas 
> should copy that value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to