Re: Luceneutil high variability between runs

Robert Muir Fri, 16 Aug 2013 15:32:37 -0700

I think the raw values don't matter so much because there is some
randomization involved? And the same random seed is used...


Your DefaultSimilarityRuns look pretty stable. its between 0.0% and
1.5% variation which is about as good as it gets for HighTerm....

LowTerm i am guessing is always noisy because they are so fast. a few
of these measures at least are, i know particularly IntNRQ :)

On Fri, Aug 16, 2013 at 6:20 PM, Tom Burton-West <[email protected]> wrote:
> Hello,
>
> I'm trying to benchmark a change to BM25Similarity (LUCENE-5175 )using
> luceneutil
>
> I'm running this on a lightly loaded machine with a load average (top) of
> about 0.01 when the benchmark is not running.
>
> I made the following changes:
> 1) localrun.py changed Competition(debug=True) to Competition(debug=False)
> 2) made the following changes to localconstants.py per Robert Muir's
> suggestion:
> JAVA_COMMAND = 'java -server -Xms4g -Xmx4g'
> SEARCH_NUM_THREADS = 1
> 3) for the BM25 tests set SIMILARITY_DEFAULT='BM25Similarity'
> 4) for the BM25 tests uncommened   the following line from searchBench.py
> #verifyScores = False
>
> Attached is output from iter 19 of several runs
>
> The first 4 runs show consistently that the modified version is somewhere
> between 6% and 8% slower on the tasks with the highest difference between
> trunk and patch.
> However if you look at the baseline TaskQPS, for HighTerm, for example,  run
> 3 is about 55 and run 1 is about 88.  So the difference for this task
> between different runs of the bench program is very much higher than the
> differences between trunk and modified/patch within a run.
>
> Is this to be expected?   Is there a reason I should believe  the
> differences shown within a run reflect the true differences?
>
> Seeing this variability, I then switched DEFAULT_SIMILARITY back to
> "DefaultSimilarity".  In this case trunk and my_modified, should be
> exercising exactly the same code, since the only changes in the patch are
> the addition of a test case for BM25Similarity and a change to
> BM25Similarity.
>
> In this case the "modified" version varies from -6.2% difference from the
> base to +4.4% difference from the base for LowTerm.
> Comparing  QPS for the base case for HighTerm between different runs we can
> see it varies from about 21 for run 1 to 76 for run 3.
>
> Is this kind of  variation between runs of the benchmark to be expected?
>
> Any suggestions about where to look to reduce the variations between runs?
>
> Tom
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Luceneutil high variability between runs

Reply via email to