Re: Disturbing and steady decrease in boosting by date performance (and maybe others).

David Smiley Fri, 20 Dec 2019 13:58:51 -0800

This unfolding story shows us why we need nightly benchmarks of Solr --
SOLR-10317 <https://issues.apache.org/jira/browse/SOLR-10317>


~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Dec 18, 2019 at 8:35 PM Joel Bernstein <joels...@gmail.com> wrote:

> One of the things that would be interesting would be to analyze the QTimes
> for individual queries from the logs for these runs. If you ship me the log
> files I can take a look. I'll also be posting a branch with new command
> line tool for posting logs to be indexed in Solr tomorrow and you can take
> a look at that.
>
> And the profiler is probably the only way to know for sure what's
> happening here.
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Dec 18, 2019 at 7:37 PM Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> The very short form is that from Solr 6.6.1 to Solr 8.3.1, the throughput
>> for date boosting in my tests dropped by 40+%
>>
>> I’ve been hearing about slowdowns in successive Solr releases with boost
>> functions, so I dug into it a bit. The test setup is just a boost-by-date
>> with an additional big OR clause of 100 random words so I’d be sure to hit
>> a bunch of docs. I figured that if there were few hits, the signal would be
>> lost in the noise, but I didn’t look at the actual hit counts.
>>
>> I saw several Solr JIRAs about this subject, but they were slightly
>> different, although quite possibly the same underlying issue. So I tried to
>> get this down to a very specific form of a query.
>>
>> I’ve also seen some cases in the wild where the response was proportional
>> to the number of segments, thus my optimize experiments.
>>
>> Here are the results, explanation below. O stands for optimized to one
>> segment. I spot checked pdate against 7x and 8x and they weren’t
>> significantly different performance wise from tdate. All have docValues
>> enabled. I ran these against a multiValued=“false” field. All the tests
>> pegged all my CPUs. Jmeter is being run on a different machine than Solr.
>> Only one Solr was running for any test.
>>
>> Solr version   queries/min
>> 6.6.1              3,400
>> 6.6.1 O           4,800
>>
>> 7.1                 2,800
>> 7.1 O             4,200
>>
>> 7.7.1              2,400
>> 7.7.1 O          3,500
>>
>> 8.3.1             2,000
>> 8.3.1 O          2,600
>>
>>
>> The tests I’ve been running just index 20M docs into a single core, then
>> run the exact same 10,000 queries against them from jmeter with 24 threads.
>> Spot checks showed no hits on the queryResultCache.
>>
>> A query looks like this:
>> rows=0&{!boost b=recip(ms(NOW,
>> INSERT_FIELD_HERE),3.16e-11,1,1)}text_txt:(campaigners OR adjourned OR
>> anyplace…97 more random words)
>>
>> There is no faceting. No grouping. No sorting.
>>
>> I fill in INSERT_FIELD_HERE through jmeter magic. I’m running the exact
>> same queries for every test.
>>
>> One wildcard is that I did regenerate the index for each major revision,
>> and the chose random words from the same list of words, as well as random
>> times (bounded in the same range though) so the docs are not completely
>> identical. The index was in the native format for that major version even
>> if slightly different between versions. I ran the test once, then ran it
>> again after optimizing the index.
>>
>> I haven’t dug any farther, if anyone’s interested I can throw a profiler
>> at, say, 8.3 and see what I can see, although I’m not going to have time to
>> dive into this any time soon. I’d be glad to run some tests though. I saved
>> the queries and the indexes so running a test would  only take a few
>> minutes.
>>
>> While I concentrated on date fields, the docs have date, int, and long
>> fields, both docValues=true and docValues=false, each variant with
>> multiValued=true and multiValued=false and both Trie and Point (where
>> possible) variants as well as a pretty simple text field.
>>
>> Erick
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

Re: Disturbing and steady decrease in boosting by date performance (and maybe others).

Reply via email to