RE: Ghost Documents or Shards out of Sync

2021-02-09 Thread Chris Hostetter


: Let me add some background. A user triggers an operation which under the 
: hood needs to update a single field. Atomic update fails with a message 
: that one of the mandatory fields is missing (which is strange by 
: itself). When I query Solr for the exact document (fq with the document 
: id) I sometimes get the expected single result and sometimes zero. Those 
: queries are done sometimes couple of days later so auto commits 
: necessarily have been performed.

I suspect waht you are seeing is that the update succeeds on a leader, but 
for some reason (i'm not really understanding your description ofthe 
atomic udpate failure) it fals on areplica -- leaving them in an 
inconsistent state.  Restarting all the nodes forces the out of sync 
replica to recover.

If i'm correct, then when you see these inconsistent results, you should 
be able to query each individual *core* that is a replica of the shard 
this document belongs in, using a "distrib=false" request param, and see 
that it exists on the "leader" replica, but not on one/some of the other 
replicas.

Understanding why/how you got into this situation though would require 
understand what exactly you mean by "a message that one of the mandatory fields 
is missing"

can you show us some details?  solrconfig/schema, example documents, 
example updates, log messages from the various nodes when these updates 
"fail", etc... ?

https://cwiki.apache.org/confluence/display/SOLR/UsingMailingLists

: One more thing that might be important - we're using nested schema, and 
: we recently encountered several issues that make me think that this 
: combination - nested and atomic updates (of parent documents) - is the 
: root cause.

it's very possible that there are some bugs related to atomic updates and 
neste documents -- the code for dealing with that combination is 
relatively new, and making it work correctly requires special fields in 
the schema -- on top of the normal atomic update rules.  The documentation 
on this was heavily updated in the 8.7 ref-guide...

https://lucene.apache.org/solr/guide/8_7/indexing-nested-documents.html
https://lucene.apache.org/solr/guide/8_7/updating-parts-of-documents.html#updating-child-documents



-Hoss
http://www.lucidworks.com/


RE: Ghost Documents or Shards out of Sync

2021-02-07 Thread Nussbaum, Ronen
Hi,

Thank you your replies - much appreciated!
I'm afraid it's neither one...
Let me add some background. A user triggers an operation which under the hood 
needs to update a single field. Atomic update fails with a message that one of 
the mandatory fields is missing (which is strange by itself). When I query Solr 
for the exact document (fq with the document id) I sometimes get the expected 
single result and sometimes zero. Those queries are done sometimes couple of 
days later so auto commits necessarily have been performed.
Only after a restart, the query is working and then atomic update succeeds.
One more thing that might be important - we're using nested schema, and we 
recently encountered several issues that make me think that this combination - 
nested and atomic updates (of parent documents) - is the root cause.

Ronen.

-Original Message-
From: Mike Drob 
Sent: יום ב 01 פברואר 2021 22:58
To: solr-user@lucene.apache.org
Subject: Re: Ghost Documents or Shards out of Sync

To expand on what Jason suggested, if the issue is the non-deterministic 
ordering due to staggered commits per replica, you may have more consistency 
with TLOG replicas rather than the NRT replicas. In this case, the underlying 
segment files should be identical and lead to more predictable results.

On Mon, Feb 1, 2021 at 2:50 PM Jason Gerlowski 
wrote:

> Hi Ronen,
>
> The first thing I'd figure out in your situation is whether the
> results are actually different each time, or whether the ordering is
> what differs (which might push a particular result off the page you're
> looking at, giving the appearance that it didn't match).
>
> In the case of the former, this can happen briefly if queries come in
> when some but not all replicas have seen a commit.  But usually this
> is a transient concern - either waiting for the next autocommit or
> triggering an explicit commit resolves the discrepancy in this case.
> Since you only see identical results after a restart, this _doesn't_
> sound like what you're seeing.
>
> In the case of the latter (same results, differently ordered) this is
> expected sometimes.  Solr sorts on relevance by default with the
> internal Lucene document ID being a tiebreaker.  Both the relevance
> statistics and Lucene's document IDs can differ across SolrCloud
> replicas (due to non-deterministic conditions such as the segment
> merging and deleted-doc removal that Lucene does under the hood), and
> this can produce differently-ordered result sets for users that issue
> the same query repeatedly.
>
> Good luck narrowing things down!
>
> Jason
>
> On Mon, Jan 25, 2021 at 3:32 AM Ronen Nussbaum  wrote:
> >
> > Hi All,
> >
> > I'm using Solr Cloud (version 8.3.0) with shards and replicas
> (replication
> > factor of 2).
> > Recently, I've encountered several times that running the same query
> > repeatedly yields different results. Restarting the nodes fixes the
> problem
> > (until next time).
> > I assume that some shards are not synchronized and I have several
> questions:
> > 1. What can cause this - many atomic updates? issues with commits?
> > 2. Can I trigger the "fixing" mechanism that Solr runs at restart by
> > an
> API
> > call or some other method?
> >
> > Thanks in advance,
> > Ronen.
>


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


Re: Ghost Documents or Shards out of Sync

2021-02-01 Thread Mike Drob
To expand on what Jason suggested, if the issue is the non-deterministic
ordering due to staggered commits per replica, you may have more
consistency with TLOG replicas rather than the NRT replicas. In this case,
the underlying segment files should be identical and lead to more
predictable results.

On Mon, Feb 1, 2021 at 2:50 PM Jason Gerlowski 
wrote:

> Hi Ronen,
>
> The first thing I'd figure out in your situation is whether the
> results are actually different each time, or whether the ordering is
> what differs (which might push a particular result off the page you're
> looking at, giving the appearance that it didn't match).
>
> In the case of the former, this can happen briefly if queries come in
> when some but not all replicas have seen a commit.  But usually this
> is a transient concern - either waiting for the next autocommit or
> triggering an explicit commit resolves the discrepancy in this case.
> Since you only see identical results after a restart, this _doesn't_
> sound like what you're seeing.
>
> In the case of the latter (same results, differently ordered) this is
> expected sometimes.  Solr sorts on relevance by default with the
> internal Lucene document ID being a tiebreaker.  Both the relevance
> statistics and Lucene's document IDs can differ across SolrCloud
> replicas (due to non-deterministic conditions such as the segment
> merging and deleted-doc removal that Lucene does under the hood), and
> this can produce differently-ordered result sets for users that issue
> the same query repeatedly.
>
> Good luck narrowing things down!
>
> Jason
>
> On Mon, Jan 25, 2021 at 3:32 AM Ronen Nussbaum  wrote:
> >
> > Hi All,
> >
> > I'm using Solr Cloud (version 8.3.0) with shards and replicas
> (replication
> > factor of 2).
> > Recently, I've encountered several times that running the same query
> > repeatedly yields different results. Restarting the nodes fixes the
> problem
> > (until next time).
> > I assume that some shards are not synchronized and I have several
> questions:
> > 1. What can cause this - many atomic updates? issues with commits?
> > 2. Can I trigger the "fixing" mechanism that Solr runs at restart by an
> API
> > call or some other method?
> >
> > Thanks in advance,
> > Ronen.
>


Re: Ghost Documents or Shards out of Sync

2021-02-01 Thread Jason Gerlowski
Forgot to answer your second question:

> Can I trigger the "fixing" mechanism that Solr runs at restart by an API call 
> or some other method?

It depends on what the cause is.  But for at least some possible
causes there is an API call that can resolve this.  Though that API
itself (Solr's misnamed "optimize" feature) comes with a lot of
warnings and has been discouraged by the community in the past.  (I
won't get into those specifics though until you figure out the cause.)

Before you consider calling "optimize" or taking any other means to
fix this though, it might be worth revisiting whether this is really
an issue?  While this quirk of Solr's can bedevil automated tests or
other things that rely on repeatability, it's unusual in many
applications for end-users to submit identical queries multiple times.
Every case is different of course, but something to consider.

Best,

Jason

On Mon, Feb 1, 2021 at 3:49 PM Jason Gerlowski  wrote:
>
> Hi Ronen,
>
> The first thing I'd figure out in your situation is whether the
> results are actually different each time, or whether the ordering is
> what differs (which might push a particular result off the page you're
> looking at, giving the appearance that it didn't match).
>
> In the case of the former, this can happen briefly if queries come in
> when some but not all replicas have seen a commit.  But usually this
> is a transient concern - either waiting for the next autocommit or
> triggering an explicit commit resolves the discrepancy in this case.
> Since you only see identical results after a restart, this _doesn't_
> sound like what you're seeing.
>
> In the case of the latter (same results, differently ordered) this is
> expected sometimes.  Solr sorts on relevance by default with the
> internal Lucene document ID being a tiebreaker.  Both the relevance
> statistics and Lucene's document IDs can differ across SolrCloud
> replicas (due to non-deterministic conditions such as the segment
> merging and deleted-doc removal that Lucene does under the hood), and
> this can produce differently-ordered result sets for users that issue
> the same query repeatedly.
>
> Good luck narrowing things down!
>
> Jason
>
> On Mon, Jan 25, 2021 at 3:32 AM Ronen Nussbaum  wrote:
> >
> > Hi All,
> >
> > I'm using Solr Cloud (version 8.3.0) with shards and replicas (replication
> > factor of 2).
> > Recently, I've encountered several times that running the same query
> > repeatedly yields different results. Restarting the nodes fixes the problem
> > (until next time).
> > I assume that some shards are not synchronized and I have several questions:
> > 1. What can cause this - many atomic updates? issues with commits?
> > 2. Can I trigger the "fixing" mechanism that Solr runs at restart by an API
> > call or some other method?
> >
> > Thanks in advance,
> > Ronen.


Re: Ghost Documents or Shards out of Sync

2021-02-01 Thread Jason Gerlowski
Hi Ronen,

The first thing I'd figure out in your situation is whether the
results are actually different each time, or whether the ordering is
what differs (which might push a particular result off the page you're
looking at, giving the appearance that it didn't match).

In the case of the former, this can happen briefly if queries come in
when some but not all replicas have seen a commit.  But usually this
is a transient concern - either waiting for the next autocommit or
triggering an explicit commit resolves the discrepancy in this case.
Since you only see identical results after a restart, this _doesn't_
sound like what you're seeing.

In the case of the latter (same results, differently ordered) this is
expected sometimes.  Solr sorts on relevance by default with the
internal Lucene document ID being a tiebreaker.  Both the relevance
statistics and Lucene's document IDs can differ across SolrCloud
replicas (due to non-deterministic conditions such as the segment
merging and deleted-doc removal that Lucene does under the hood), and
this can produce differently-ordered result sets for users that issue
the same query repeatedly.

Good luck narrowing things down!

Jason

On Mon, Jan 25, 2021 at 3:32 AM Ronen Nussbaum  wrote:
>
> Hi All,
>
> I'm using Solr Cloud (version 8.3.0) with shards and replicas (replication
> factor of 2).
> Recently, I've encountered several times that running the same query
> repeatedly yields different results. Restarting the nodes fixes the problem
> (until next time).
> I assume that some shards are not synchronized and I have several questions:
> 1. What can cause this - many atomic updates? issues with commits?
> 2. Can I trigger the "fixing" mechanism that Solr runs at restart by an API
> call or some other method?
>
> Thanks in advance,
> Ronen.