[
https://issues.apache.org/jira/browse/LUCENE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated LUCENE-8040:
---------------------------------
Attachment: MyBenchmark.java
I updated the benchmark to use a custom FilterDirectoryReader that ultimately
has a custom FilterLeafReader that caches the Terms impls into a HashMap. Then
I reran the benchmark with 150 fields, 30 segments:
{noformat}
IndexSearcher MultiFields (current)
346.155 ±(99.9%) 57.775 us/op [Average]
(min, avg, max) = (334.952, 346.155, 371.996), stdev = 15.004
CI (99.9%): [288.380, 403.930] (assumes normal distribution)
Raw compute on demand each time
196.271 ±(99.9%) 14.716 us/op [Average]
(min, avg, max) = (192.012, 196.271, 201.187), stdev = 3.822
CI (99.9%): [181.555, 210.987] (assumes normal distribution)
ConcurrentHashMap lazy cache of raw compute
4.553 ±(99.9%) 0.245 us/op [Average]
(min, avg, max) = (4.465, 4.553, 4.636), stdev = 0.064
CI (99.9%): [4.308, 4.799] (assumes normal distribution)
{noformat}
Clearly the ConcurrentHashMap is saving us a lot.
You say we shouldn't add caching to IndexSearcher. IndexSearcher contains the
QueryCache. Looking at LRUQueryCache, I think I can safely say that a
ConcurrentHashMap is comparatively more lightweight. Do you disagree?
> Optimize IndexSearcher.collectionStatistics
> -------------------------------------------
>
> Key: LUCENE-8040
> URL: https://issues.apache.org/jira/browse/LUCENE-8040
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Reporter: David Smiley
> Assignee: David Smiley
> Fix For: 7.2
>
> Attachments: MyBenchmark.java, lucenecollectionStatisticsbench.zip
>
>
> {{IndexSearcher.collectionStatistics(field)}} can do a fair amount of work
> because with each invocation it will call {{MultiFields.getTerms(...)}}. The
> effects of this are aggravated for queries with many fields since each field
> will want statistics, and also aggravated when there are many segments.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]