Michael McCandless created LUCENE-9406:
------------------------------------------

             Summary: Make it simpler to track IndexWriter's events
                 Key: LUCENE-9406
                 URL: https://issues.apache.org/jira/browse/LUCENE-9406
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/index
            Reporter: Michael McCandless


This is the second spinoff from a [controversial PR to add a new index-time 
feature to Lucene to merge small segments during 
commit|https://github.com/apache/lucene-solr/pull/1552].  That change can 
substantially reduce the number of small index segments to search.

In that PR, there was a new proposed interface, {{IndexWriterEvents}}, giving 
the application a chance to track when {{IndexWriter}} kicked off merges during 
commit, how many, how long it waited, how often it gave up waiting, etc.

Such telemetry from production usage is really helpful when tuning settings 
like which merges (e.g. a size threshold) to attempt on commit, and how long to 
wait during commit, etc.

I am splitting out this issue to explore possible approaches to do this.  E.g. 
[~simonw] proposed using a statistics class instead, but if I understood that 
correctly, I think that would put the role of aggregation inside 
{{IndexWriter}}, which is not ideal.

Many interesting events, e.g. how many merges are being requested, how large 
are they, how long did they take to complete or fail, etc., can be gleaned by 
wrapping expert Lucene classes like {{MergePolicy}} and {{MergeScheduler}}.  
But for those events that cannot (e.g. {{IndexWriter}} stopped waiting for 
merges during commit), it would be very helpful to have some simple way to 
track so applications can better tune.

It is also possible to subclass {{IndexWriter}} and override key methods, but I 
think that is inherently risky as {{IndexWriter}}'s protected methods are not 
considered to be a stable API, and the synchronization used by {{IndexWriter}} 
is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to