The other approach would be to do equality tests with a fuzz factor, because floating point is like that. But that would probably make things slower.
Here is an example of fuzzy equals: https://github.com/OpenGamma/Strata/blob/master/modules/math/src/test/java/com/opengamma/strata/math/impl/FuzzyEquals.java wunder Walter Underwood [email protected] http://observer.wunderwood.org/ (my blog) > On Nov 14, 2017, at 8:57 AM, Chris Hostetter <[email protected]> wrote: > > : In the BM25 case, scores would decrease in some situations with very > : high TF values because of floating point issues, e.g. so > : score(freq=100,000) would be unexpectedly less than > : score(freq=99,999), all other things being equal. There may be other > : ways to re-arrange the code to avoid this problem, feel free to open > : an issue if you can optimize the code better while still behaving > : properly! > > i don't have any idea how to optimize the current code, and I am > completley willing to believe the changes in LUCENE-7997 are an > improvement in terms of correctness -- which is certainly more important > then performance -- I just wanted to point out that Alan's observation > about LUCENE-8018 being the only commit around the time the performance > graphs dip wasn't accurate before anyone started ripping their hair out > trying to explain it. > > If you think the float/double math in LUCENE-7997 might explain the change > in mike's graphs, then maybe mike can annotate them to record that? > > (Wild spit balling idea: would be worth while to offer an > "ImpreciseBM25Similarity" that used floats instead of doubles for people > who want to eek out every lsat bit of performance -- provided it was > heavily documented with caveats regarding inaccurate scores due to > rounding errors?) > > > -Hoss > http://www.lucidworks.com/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >
