[
https://issues.apache.org/jira/browse/SOLR-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-4455:
---------------------------
Attachment: SOLR-4455.patch
Attaching a patch that adds the logic i was thinking of to
DistributedUpdateProcessor.
At first i was confused why none of the existing distributed query tests
weren't already failing, since the test config includes a "timestamp" field --
and then i realized it's because the "handler" for comparing multiple responses
for identical queries is configured to "SKIPVAL" the timestamp field in most
tests.
I updated a lot of the test scafolding to explicitly set a consistent NOW when
talking to both the controlClient and a distributedClient.
In the attached patch, TestDistributedSearch and BasicDistributedZkTest have
both been updated to no longer SKIPVAL the timestamp, and they pss,
demontrating that the basics of this test scaffolding changes and the changes
to DistributedUpdateProcessor seem to work.
BasicDistributedZk2Test on the otherhand fails very early and consistently with
these changes and the timestamp SKIPVAL disabled ... with the "nocommit" in
place to always force a NOW value of in the year 2038, you can see from the
logs that somehow the cloud copy of doc id=1 is still getting a timestamp of
the currenttime, even though the control solr instance gets the expected
value... i'm not really sure why/how this is happening, because you can see
the NOW value specified in the logs for all the /update requests related to
id=1 (even when forwarded from the leader)
----
One thing that should be noted is that while typing up these notes, it occured
to me that these changes still might not garuntee consistency in the case of a
recovery situation that results in replaying the transaction log -- in which
case the _documents_ are recorded, but not all of the update request params
like NOW.
I'm not certain if this is causing the BasicDistributedZk2Test failures
mentioned above -- but it's certianly possible (i do see mentions in the logs
of "Log replay finished. recoveryInfo=RecoveryInfo{adds=1 ...", but it's not
clear to me why any recovery would be happening ... nothing jumps out at me in
this test to suggest that anything is aborting nodes to force recovery.
> Stored value of "NOW" differs between replicas
> ----------------------------------------------
>
> Key: SOLR-4455
> URL: https://issues.apache.org/jira/browse/SOLR-4455
> Project: Solr
> Issue Type: Bug
> Components: update
> Affects Versions: 4.1
> Reporter: Colin Bartolome
> Assignee: Hoss Man
> Priority: Minor
> Attachments: SOLR-4455.patch
>
>
> I have a field in {{schema.xml}} defined like this:
> {code:xml}
> <field name="timestamp" type="date" indexed="true" stored="true"
> default="NOW" />
> {code}
> When I perform a query that's load-balanced across the servers in my cloud,
> the value stored in that field differs slightly between each replica for the
> same returned document.
> I haven't seen this field differ by more than a tenth of a second and I'm not
> running queries against it, but I can picture a situation where somebody has
> one replica returning 23:59:59.990 and another returning 00:00:00.010 and a
> query starts behaving oddly.
> It seems like the leader should evaluate what "NOW" means and the replicas
> should copy that value.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]