Hi Joel,

RE SumFloatFunction: agreed; should be "any" not "all" logic.

RE QueryDocValues: I think the current implementation is probably correct
since a default value is mandatory.  That said, I could imagine using a
def(query-here, 1.0) to accomplish the same.  Ugh; I see DefFunction.exists
isn't simply "true" but I think that's erroneous.

I recommend tagging Hossman in your proposals as he improved the exists()
handling 9 years ago.

~ David


On Mon, Jul 3, 2023 at 7:38 PM Joel Westberg <j.a.e.westb...@gmail.com>
wrote:

> Hi Solr devs!
>
> I've identified some surprising behavior with how MultiFloat functions
> like *max
> *and *sum *interact with QueryValueSource and wanted to get some second
> opinions before I open a bug ticket. I suspect this is a Lucene issue, but
> starting here as Solr is my entry-point to this problem. This issue is
> present in (at least) Solr 7 as well as the latest Solr 9.2 release.
>
>
> *Examples*
> In the examples below I have an index consisting of these two docs:
> *[*
> *  {"id":"A", "i_d":1}, *
> *  {"id":"B", "i_d":2}*
> *] *
>
> I'm running a set of queries using *q=*:*&defType=edismax* and applying a
> *boost* parameter.
>
> Query 1: *boost=query({!lucene v="id:A^=10"}, 1)*
> Observed scores for the two documents in this case comes out to A=10, B=1
> as is expected. B is not scored by the query function, but the default
> value is 1, so it gets the score 1*1.
>
> Query 2: *boost=max(0, query({!lucene v="id:A^=10"}, 1))*
> Here I've added a *max(0, ...) *wrapper around the same query function as
> above. In this case, the observed scores for the two documents come out to
> A=10, *B=0*. This is surprising, as I would normally expect *max(0, 1)=1*.
>
> Query 3: *boost=sum(i_d, query({!lucene v="id:A^=10"}, 1))*
> Adding in a *sum* here, we get the scores *A=11, B=3* which is what we
> expect (*MatchAll(1) * (2+1)=3*).
>
> Query 4: *boost=**max(1, sum(i_d, query({!lucene v="id:A^=10"}, 1)))*
> Wrapping Query 3 in a max function (and a bit closer to my actual use case)
> to ensure that we do not multiply by anything less than *1* we get the
> following scores: A=11, *B=1*.
>
> Results 2 and 4 were very surprising, and difficult to detect and
> understand.
>
> *Root cause*
> Tracing this issue down through the code, it seems to stem from
> MaxFloatFunction.func
> <
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java#L39
> >
> checking
> if each component part (in this case const(1) and query(..)) scores the
> given doc rather than simply retrieving the score, and
> QueryDocValues.exists
> <
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/QueryValueSource.java#L141
> >
> returning
> *false* for any document not matched by the query (regardless of the
> default value).
>
> It is also surprising that the implementation of SumFloatFunction.exists
> <
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MultiFloatFunction.java#L52
> >
> is
> implemented as *allExists* rather than *anyExists, *which is why Query 4
> breaks and completely ignores the *i_d* score component. I expected that
> *sum* would skip any of its value sources that do not apply to the given
> doc being scored, and simply summing up the rest.
>
> *Workaround*
> A relatively straightforward workaround from the query writing side is to
> not rely on the default value of the QueryFunction and instead always do
> *max(<default_value>, query(...)).*
> *TL;DR:*
> Wanted to get a temperature check on what parts of this might make sense to
> open a bug on (if any) and in which project?
>
> I have no idea how many things may break deep inside Lucene if this
> behavior were to change, given that it appears to have been there for a
> very long time, so perhaps some new Solr-specific value functions and some
> docs is the thing to do?
>
>
> Thanks in advance,
> Joel Westberg
>

Reply via email to