: In the BM25 case, scores would decrease in some situations with very
: high TF values because of floating point issues, e.g. so
: score(freq=100,000) would be unexpectedly less than
: score(freq=99,999), all other things being equal. There may be other
: ways to re-arrange the code to avoid this problem, feel free to open
: an issue if you can optimize the code better while still behaving
: properly!

i don't have any idea how to optimize the current code, and I am 
completley willing to believe the changes in LUCENE-7997 are an 
improvement in terms of correctness -- which is certainly more important 
then performance -- I just wanted to point out that Alan's observation 
about LUCENE-8018 being the only commit around the time the performance 
graphs dip wasn't accurate before anyone started ripping their hair out 
trying to explain it.

If you think the float/double math in LUCENE-7997 might explain the change 
in mike's graphs, then maybe mike can annotate them to record that?

(Wild spit balling idea: would be worth while to offer an 
"ImpreciseBM25Similarity" that used floats instead of doubles for people 
who want to eek out every lsat bit of performance -- provided it was 
heavily documented with caveats regarding inaccurate scores due to 
rounding errors?)


-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to