: In the BM25 case, scores would decrease in some situations with very : high TF values because of floating point issues, e.g. so : score(freq=100,000) would be unexpectedly less than : score(freq=99,999), all other things being equal. There may be other : ways to re-arrange the code to avoid this problem, feel free to open : an issue if you can optimize the code better while still behaving : properly!
i don't have any idea how to optimize the current code, and I am completley willing to believe the changes in LUCENE-7997 are an improvement in terms of correctness -- which is certainly more important then performance -- I just wanted to point out that Alan's observation about LUCENE-8018 being the only commit around the time the performance graphs dip wasn't accurate before anyone started ripping their hair out trying to explain it. If you think the float/double math in LUCENE-7997 might explain the change in mike's graphs, then maybe mike can annotate them to record that? (Wild spit balling idea: would be worth while to offer an "ImpreciseBM25Similarity" that used floats instead of doubles for people who want to eek out every lsat bit of performance -- provided it was heavily documented with caveats regarding inaccurate scores due to rounding errors?) -Hoss http://www.lucidworks.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
