[jira] [Updated] (LUCENE-8040) Optimize IndexSearcher.collectionStatistics

David Smiley (JIRA) Thu, 09 Nov 2017 18:48:16 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Smiley updated LUCENE-8040:
---------------------------------
    Attachment: MyBenchmark.java

I updated the benchmark to use a custom FilterDirectoryReader that ultimately 
has a custom FilterLeafReader that caches the Terms impls into a HashMap.  Then 
I reran the benchmark with 150 fields, 30 segments:
{noformat}
IndexSearcher MultiFields (current)
  346.155 ±(99.9%) 57.775 us/op [Average]
  (min, avg, max) = (334.952, 346.155, 371.996), stdev = 15.004
  CI (99.9%): [288.380, 403.930] (assumes normal distribution)

Raw compute on demand each time
  196.271 ±(99.9%) 14.716 us/op [Average]
  (min, avg, max) = (192.012, 196.271, 201.187), stdev = 3.822
  CI (99.9%): [181.555, 210.987] (assumes normal distribution)

ConcurrentHashMap lazy cache of raw compute
  4.553 ±(99.9%) 0.245 us/op [Average]
  (min, avg, max) = (4.465, 4.553, 4.636), stdev = 0.064
  CI (99.9%): [4.308, 4.799] (assumes normal distribution)
{noformat}

Clearly the ConcurrentHashMap is saving us a lot.

You say we shouldn't add caching to IndexSearcher.  IndexSearcher contains the 
QueryCache.  Looking at LRUQueryCache, I think I can safely say that a 
ConcurrentHashMap is comparatively more lightweight.  Do you disagree?

> Optimize IndexSearcher.collectionStatistics
> -------------------------------------------
>
>                 Key: LUCENE-8040
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8040
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 7.2
>
>         Attachments: MyBenchmark.java, lucenecollectionStatisticsbench.zip
>
>
> {{IndexSearcher.collectionStatistics(field)}} can do a fair amount of work 
> because with each invocation it will call {{MultiFields.getTerms(...)}}.  The 
> effects of this are aggravated for queries with many fields since each field 
> will want statistics, and also aggravated when there are many segments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8040) Optimize IndexSearcher.collectionStatistics

Reply via email to