This unfolding story shows us why we need nightly benchmarks of Solr -- SOLR-10317 <https://issues.apache.org/jira/browse/SOLR-10317>
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Wed, Dec 18, 2019 at 8:35 PM Joel Bernstein <joels...@gmail.com> wrote: > One of the things that would be interesting would be to analyze the QTimes > for individual queries from the logs for these runs. If you ship me the log > files I can take a look. I'll also be posting a branch with new command > line tool for posting logs to be indexed in Solr tomorrow and you can take > a look at that. > > And the profiler is probably the only way to know for sure what's > happening here. > > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Wed, Dec 18, 2019 at 7:37 PM Erick Erickson <erickerick...@gmail.com> > wrote: > >> The very short form is that from Solr 6.6.1 to Solr 8.3.1, the throughput >> for date boosting in my tests dropped by 40+% >> >> I’ve been hearing about slowdowns in successive Solr releases with boost >> functions, so I dug into it a bit. The test setup is just a boost-by-date >> with an additional big OR clause of 100 random words so I’d be sure to hit >> a bunch of docs. I figured that if there were few hits, the signal would be >> lost in the noise, but I didn’t look at the actual hit counts. >> >> I saw several Solr JIRAs about this subject, but they were slightly >> different, although quite possibly the same underlying issue. So I tried to >> get this down to a very specific form of a query. >> >> I’ve also seen some cases in the wild where the response was proportional >> to the number of segments, thus my optimize experiments. >> >> Here are the results, explanation below. O stands for optimized to one >> segment. I spot checked pdate against 7x and 8x and they weren’t >> significantly different performance wise from tdate. All have docValues >> enabled. I ran these against a multiValued=“false” field. All the tests >> pegged all my CPUs. Jmeter is being run on a different machine than Solr. >> Only one Solr was running for any test. >> >> Solr version queries/min >> 6.6.1 3,400 >> 6.6.1 O 4,800 >> >> 7.1 2,800 >> 7.1 O 4,200 >> >> 7.7.1 2,400 >> 7.7.1 O 3,500 >> >> 8.3.1 2,000 >> 8.3.1 O 2,600 >> >> >> The tests I’ve been running just index 20M docs into a single core, then >> run the exact same 10,000 queries against them from jmeter with 24 threads. >> Spot checks showed no hits on the queryResultCache. >> >> A query looks like this: >> rows=0&{!boost b=recip(ms(NOW, >> INSERT_FIELD_HERE),3.16e-11,1,1)}text_txt:(campaigners OR adjourned OR >> anyplace…97 more random words) >> >> There is no faceting. No grouping. No sorting. >> >> I fill in INSERT_FIELD_HERE through jmeter magic. I’m running the exact >> same queries for every test. >> >> One wildcard is that I did regenerate the index for each major revision, >> and the chose random words from the same list of words, as well as random >> times (bounded in the same range though) so the docs are not completely >> identical. The index was in the native format for that major version even >> if slightly different between versions. I ran the test once, then ran it >> again after optimizing the index. >> >> I haven’t dug any farther, if anyone’s interested I can throw a profiler >> at, say, 8.3 and see what I can see, although I’m not going to have time to >> dive into this any time soon. I’d be glad to run some tests though. I saved >> the queries and the indexes so running a test would only take a few >> minutes. >> >> While I concentrated on date fields, the docs have date, int, and long >> fields, both docValues=true and docValues=false, each variant with >> multiValued=true and multiValued=false and both Trie and Point (where >> possible) variants as well as a pretty simple text field. >> >> Erick >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>