Hi Joel, RE SumFloatFunction: agreed; should be "any" not "all" logic.
RE QueryDocValues: I think the current implementation is probably correct since a default value is mandatory. That said, I could imagine using a def(query-here, 1.0) to accomplish the same. Ugh; I see DefFunction.exists isn't simply "true" but I think that's erroneous. I recommend tagging Hossman in your proposals as he improved the exists() handling 9 years ago. ~ David On Mon, Jul 3, 2023 at 7:38 PM Joel Westberg <j.a.e.westb...@gmail.com> wrote: > Hi Solr devs! > > I've identified some surprising behavior with how MultiFloat functions > like *max > *and *sum *interact with QueryValueSource and wanted to get some second > opinions before I open a bug ticket. I suspect this is a Lucene issue, but > starting here as Solr is my entry-point to this problem. This issue is > present in (at least) Solr 7 as well as the latest Solr 9.2 release. > > > *Examples* > In the examples below I have an index consisting of these two docs: > *[* > * {"id":"A", "i_d":1}, * > * {"id":"B", "i_d":2}* > *] * > > I'm running a set of queries using *q=*:*&defType=edismax* and applying a > *boost* parameter. > > Query 1: *boost=query({!lucene v="id:A^=10"}, 1)* > Observed scores for the two documents in this case comes out to A=10, B=1 > as is expected. B is not scored by the query function, but the default > value is 1, so it gets the score 1*1. > > Query 2: *boost=max(0, query({!lucene v="id:A^=10"}, 1))* > Here I've added a *max(0, ...) *wrapper around the same query function as > above. In this case, the observed scores for the two documents come out to > A=10, *B=0*. This is surprising, as I would normally expect *max(0, 1)=1*. > > Query 3: *boost=sum(i_d, query({!lucene v="id:A^=10"}, 1))* > Adding in a *sum* here, we get the scores *A=11, B=3* which is what we > expect (*MatchAll(1) * (2+1)=3*). > > Query 4: *boost=**max(1, sum(i_d, query({!lucene v="id:A^=10"}, 1)))* > Wrapping Query 3 in a max function (and a bit closer to my actual use case) > to ensure that we do not multiply by anything less than *1* we get the > following scores: A=11, *B=1*. > > Results 2 and 4 were very surprising, and difficult to detect and > understand. > > *Root cause* > Tracing this issue down through the code, it seems to stem from > MaxFloatFunction.func > < > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MaxFloatFunction.java#L39 > > > checking > if each component part (in this case const(1) and query(..)) scores the > given doc rather than simply retrieving the score, and > QueryDocValues.exists > < > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/QueryValueSource.java#L141 > > > returning > *false* for any document not matched by the query (regardless of the > default value). > > It is also surprising that the implementation of SumFloatFunction.exists > < > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MultiFloatFunction.java#L52 > > > is > implemented as *allExists* rather than *anyExists, *which is why Query 4 > breaks and completely ignores the *i_d* score component. I expected that > *sum* would skip any of its value sources that do not apply to the given > doc being scored, and simply summing up the rest. > > *Workaround* > A relatively straightforward workaround from the query writing side is to > not rely on the default value of the QueryFunction and instead always do > *max(<default_value>, query(...)).* > *TL;DR:* > Wanted to get a temperature check on what parts of this might make sense to > open a bug on (if any) and in which project? > > I have no idea how many things may break deep inside Lucene if this > behavior were to change, given that it appears to have been there for a > very long time, so perhaps some new Solr-specific value functions and some > docs is the thing to do? > > > Thanks in advance, > Joel Westberg >