Hi Phil, Please start new threads (emails) for new problems instead of replying to an existing one. The behavior of the existing thread does not result in an error; yours does, and so I think they are entirely dissimilar. Also, you'll need to dig deeper to learn what the particular error was and report that. Go to Solr's logs.
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Fri, Mar 6, 2020 at 2:01 PM Staley, Phil R - DCF < phil.sta...@wisconsin.gov> wrote: > We recently upgraded to our Drupal 8 sites to SOLR 8.3.1. We are now > getting reports of certain patterns of search terms resulting in an error > that reads, “The website encountered an unexpected error. Please try again > later.” > > > > Below is a list of example terms that always result in this error and a > similar list that works fine. The problem pattern seems to be a search > term that contains 2 or 3 characters followed by a space, followed by > additional text. > > > > To confirm that the problem is version 8 of SOLR, I have updated our local > and UAT sites with the latest Drupal updates that did include an update to > the Search API Solr module and tested the terms below under SOLR 7.7.2, > 8.3.1, and 8.4.1. Under version 7.7.2 everything works fine. Under either > of the version 8, the problem returns. > > > > Thoughts? > > > > Search terms that result in error > > • w-2 agency directory > > • agency w-2 directory > > • w-2 agency > > • w-2 directory > > • w2 agency directory > > • w2 agency > > • w2 directory > > > > Search terms that do not result in error • w-22 agency directory • agency > directory w-2 • agency w-2directory • agencyw-2 directory • w-2 • w2 • > agency directory • agency • directory • -2 agency directory • 2 agency > directory • w-2agency directory • w2agency directory > > > > > > *From:* Michele Palmia <micpal...@gmail.com> > *Sent:* Friday, March 6, 2020 9:50 AM > *To:* dev@lucene.apache.org > *Subject:* Re: Inconsistent query results in Lucene 8.1.0 > > > > Hi all, > > > > I looked into this today. I can reproduce it and I believe it's a bug. > > This is caused by the following working together: > - LUCENE-7386 > <https://secure-web.cisco.com/1gkr5LTkeMdFRicQeMHBrlIXyvYIp1P0w27F8ZyT5bqofSPZImBg6_ZLgaf_B47pxYLZrmC0Hii3RiNGaduLkJuOucpPDOOkNGg4Rp1CBK7fYACGGtdIHLiqEjBvZwgVes2TufYNMazfSwd564IYMqf1b8zvn6lZtNgH-fi2fdysnaxVVcNUZ8rhZWJL5GUXAh6tijSHheIBqeJdZW9RVrh8VYrD4RyTQraOGs4-M8ajOQCHeLAWMjxe-tdAhwoip1iA4gdb6tDE2xV_SuXbdjA/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FLUCENE-7386> > Flatten nested disjunctions > > - LUCENE-7925 > <https://secure-web.cisco.com/1hzn5x604aHO9rCwQ2LgnrasmSRAfGal79Kj0TxxLjLVvoXnCA2qw7hnjtlkZFqVG-5QSDKfdkxwyo7HbsdW02QQjr0hkeD2MM-Arlgh8Me7TL3VL1WtaWpdPLTthfJfHxytGjEuHe4_lgaXBOPGT0Asc4mgOUL8X0HZvEFwHdPyr8Frjgc9xXNJMSxue85CPT6wX_vTczFI5WIJptjmt5HPnhD-2109aCueO-F0bw7XssxckniCtAlIkUaRCrt-PRYhXal-7UGzFztVDHNI9Xg/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FLUCENE-7925> > Deduplicate SHOULD and MUST clauses in BooleanQuery > > > > Blended term queries modify the df/ttf of their terms to make sure all > terms produce identical scores. In this case, two blended term queries > contain a few terms each, only some of which overlap. The two queries > calculate different df/ttf for their terms respectively, since the two sets > are different. During the rewrite process, > > 1. the two Blended queries get rewritten as Boolean queries > themselves, with each (modified) TermQuery as a SHOULD clause > 2. the nested Boolean queries get flattened, since they are nested > disjunctions > 3. the Term queries (some of which are actually Boost queries) are > deduplicated, with one of the two TermQuery and its modified TermStates > being picked at random (the randomness is due to the HashSet underlying > Lucene's MultiSet). > > I haven't managed to create a failing test yet, I'll share it when I have > one ready. > > If anybody has suggestions or pointers on how this should be fixed, I'm > also happy to provide a patch - I'm just a bit clueless what the right > thing to do would be here: I have a feeling (2.) should not happen for > (rewritten) Blended Queries? > > > > Cheers, > > Michele > > > > > > On Tue, Mar 3, 2020 at 7:55 PM Fiona Hasanaj <fi...@basistech.com> wrote: > > Hello, > > > > I’m Fiona with Basis Technology. We’re investigating what we believe to be > a bug involving inconsistent query results. We have binary searched this > issue and found that it specifically appears when flattening nested > disjunctions was introduced with the merge of LUCENE-7386 > <https://secure-web.cisco.com/1gkr5LTkeMdFRicQeMHBrlIXyvYIp1P0w27F8ZyT5bqofSPZImBg6_ZLgaf_B47pxYLZrmC0Hii3RiNGaduLkJuOucpPDOOkNGg4Rp1CBK7fYACGGtdIHLiqEjBvZwgVes2TufYNMazfSwd564IYMqf1b8zvn6lZtNgH-fi2fdysnaxVVcNUZ8rhZWJL5GUXAh6tijSHheIBqeJdZW9RVrh8VYrD4RyTQraOGs4-M8ajOQCHeLAWMjxe-tdAhwoip1iA4gdb6tDE2xV_SuXbdjA/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FLUCENE-7386>. > In order to reproduce the issue, I have attached a Lucene index built in > Lucene 8.1.0 as names_index.tar.gz and if you run the attached Java class > (LuceneSearchIndex.java) multiple times against Lucene 8.0.0 you'll see the > max_score is the same between runs whereas if you run it against Lucene > 8.1.0 you'll see inconsistent max_score between runs (try a max of 10 runs > and you should be able to see that sometimes it returns max_score of > 1.8651859 and sometimes 2.1415303). > > > > From debugging in Lucene 8.1.0, the query against the name index before > flattening its nested disjunctions looks like below: > > > (((bt_rni_name_encoded_1:ALFR)^0.75 bt_rni_name_encoded_1:ALTR > (bt_rni_name_encoded_1:ANTR)^0.75 (bt_rni_name_encoded_1:LTR)^0.6666666) > ((bt_rni_name_encoded_1:ALTR)^0.75 (bt_rni_name_encoded_1:FLTMR)^0.75 > (bt_rni_name_encoded_1:FLTRN)^0.75 (bt_rni_name_encoded_1:FLTS)^0.75 > (bt_rni_name_encoded_1:FTR)^0.6666666 (bt_rni_name_encoded_1:LTR)^0.6666666)) > | (((bt_rni_name_encoded_2:FLTR)^0.75) (bt_rni_name_encoded_2:FLTR > (bt_rni_name_encoded_2:FLTRN)^0.75)) > > > The term that's causing the difference in the final score is > bt_rni_name_encoded_1:ALTR and as we can see in the above query, it shows > twice nested under different clauses: in the first clause that it occurs > the docFreq for it is 3, and for the same term but in the second clause > that it appears in, its docFreq is 2. This happens in Lucene 8.0.0 as well; > *is > a term being read with different docFreq values expected behaviour? * > > > > After flattening the nested disjunctions (part of query rewrite process), > the query looks like below: > > > ((bt_rni_name_encoded_1:FTR)^0.6666666 (bt_rni_name_encoded_1:FLTRN)^0.75 > (bt_rni_name_encoded_1:FLTMR)^0.75 (bt_rni_name_encoded_1:ALFR)^0.75 > (bt_rni_name_encoded_1:FLTS)^0.75 (bt_rni_name_encoded_1:ANTR)^0.75 > (bt_rni_name_encoded_1:LTR)^1.3333333 (bt_rni_name_encoded_1:ALTR)^1.75) | > ((bt_rni_name_encoded_2:FLTRN)^0.75 (bt_rni_name_encoded_2:FLTR)^1.75) > > > > As you can see, bt_rni_name_encoded_1:ALTR shows only once, but the weight > has been summed up from the original query. This is the version of the > query that actually gets used, and the docFreq here for the > bt_rni_name_encoded_1:ALTR term sometimes it shows as 3 and sometimes it > shows as 2 between runs and final score changes accordingly to that. *Is > this "coin toss" pick of docFreq for the same term expected behaviour? * > > > > Looks like the issue stems from one of the behaviours observed and > highlighted in bold. > > > > Looking forward to hearing back from you. > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >