Hello,
I'm trying to benchmark a change to BM25Similarity (LUCENE-5175 )using
luceneutil
I'm running this on a lightly loaded machine with a load average (top) of
about 0.01 when the benchmark is not running.
I made the following changes:
1) localrun.py changed Competition(debug=True) to Competition(debug=False)
2) made the following changes to localconstants.py per Robert Muir's
suggestion:
JAVA_COMMAND = 'java -server -Xms4g -Xmx4g'
SEARCH_NUM_THREADS = 1
3) for the BM25 tests set SIMILARITY_DEFAULT='BM25Similarity'
4) for the BM25 tests uncommened the following line from searchBench.py
#verifyScores = False
Attached is output from iter 19 of several runs
The first 4 runs show consistently that the modified version is somewhere
between 6% and 8% slower on the tasks with the highest difference between
trunk and patch.
However if you look at the baseline TaskQPS, for HighTerm, for example,
run 3 is about 55 and run 1 is about 88. So the difference for this task
between different runs of the bench program is very much higher than the
differences between trunk and modified/patch within a run.
Is this to be expected? Is there a reason I should believe the
differences shown within a run reflect the true differences?
Seeing this variability, I then switched DEFAULT_SIMILARITY back to
"DefaultSimilarity". In this case trunk and my_modified, should be
exercising exactly the same code, since the only changes in the patch are
the addition of a test case for BM25Similarity and a change to
BM25Similarity.
In this case the "modified" version varies from -6.2% difference from the
base to +4.4% difference from the base for LowTerm.
Comparing QPS for the base case for HighTerm between different runs we can
see it varies from about 21 for run 1 to 76 for run 3.
Is this kind of variation between runs of the benchmark to be expected?
Any suggestions about where to look to reduce the variations between runs?
Tom
BM25Similarity runs where "my_modified_version" is LUCENE-
tail -33 BM25SimRun1 |head -5
Report after iter 19:
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff
HighTerm 87.91 (13.2%) 81.02 (8.5%)
-7.8% ( -26% - 16%)
MedTerm 111.81 (13.2%) 103.11 (8.4%)
-7.8% ( -25% - 15%)
LowTerm 411.44 (17.7%) 382.47 (14.5%)
-7.0% ( -33% - 30%)
[tburtonw@alamo runs]$ tail -33 BM25SimRun2 |head -5
Report after iter 19:
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff
HighTerm 62.15 (6.4%) 58.10 (7.1%)
-6.5% ( -18% - 7%)
MedTerm 139.11 (4.5%) 130.22 (7.5%)
-6.4% ( -17% - 5%)
LowTerm 391.93 (10.5%) 373.71 (13.1%)
-4.6% ( -25% - 21%)
[tburtonw@alamo runs]$ tail -33 BM25SimRun3 |head -5
Report after iter 19:
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff
HighTerm 54.85 (6.5%) 50.18 (1.6%)
-8.5% ( -15% - 0%)
MedTerm 146.04 (8.6%) 137.31 (4.7%)
-6.0% ( -17% - 8%)
OrNotHighLow 45.85 (11.1%) 43.37 (10.6%)
-5.4% ( -24% - 18%)
[tburtonw@alamo runs]$ tail -33 BM25SimRun4 |head -5
Report after iter 19:
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff
OrNotHighMed 49.40 (8.7%) 45.37 (8.8%)
-8.2% ( -23% - 10%)
OrNotHighLow 65.48 (8.7%) 60.19 (9.0%)
-8.1% ( -23% - 10%)
OrNotHighHigh 37.06 (8.2%) 34.18 (8.2%)
-7.8% ( -22% - 9%)
==================================================================================================================
Default similarity, which is not modified by the BM25 patch
DefaultSimRun1
LowTerm 398.97 (17.9%) 398.94 (18.1%)
-0.0% ( -30% - 43%)
HighTerm 21.13 (12.1%) 21.45 (12.2%)
1.5% ( -20% - 29%)
DefaultSimRun2
LowTerm 406.93 (17.1%) 381.51 (15.8%)
-6.2% ( -33% - 32%)
HighTerm 59.21 (2.5%) 59.70 (3.5%)
0.8% ( -5% - 7%)
DefaultSimRun3
LowTerm 431.59 (18.5%) 450.55 (16.8%)
4.4% ( -26% - 48%)
HighTerm 76.45 (2.0%) 76.45 (1.7%)
0.0% ( -3% - 3%)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]