[ 
https://issues.apache.org/jira/browse/LUCENE-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271878#comment-17271878
 ] 

Zach Chen commented on LUCENE-9406:
-----------------------------------

Makes sense! For the initial interface proposal, I’m thinking something along 
the same line with what you had in the previous PR for event metrics collection:
{code:java}
interface EventMetrics {
    Map<MetricName, Metric> providesMetrics();
}

interface IndexWriterEvent extends EventMetrics {
    public void beginPointInTimeMerge(MergeTrigger);
    public void completePointInTimeMerge(MergeTrigger);
    ...
}
{code}
The implementation for IndexWriterEvent can be set into IndexWriterConfig / 
LiveIndexWriterConfig, and used in IndexWriter’s key event points just like in 
previous PR. 

For event metrics consumption, I’m considering something similar to 
Dropwizard’s metrics reporter:
{code:java}
interface EventMetricsReporter {
   public void report(EventMetrics);  // calls EventMetrics.provideMetrics() to 
get data
}
{code}
such that application can provide custom implementation for data consumption: 
{code:java}
class FileBasedEventReporter extends EventMetricsReporter {}
class NetworkBasedEventReporter extends EventMetricsReporter {}
...{code}
 

What do you think ?

> Make it simpler to track IndexWriter's events
> ---------------------------------------------
>
>                 Key: LUCENE-9406
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9406
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Priority: Major
>
> This is the second spinoff from a [controversial PR to add a new index-time 
> feature to Lucene to merge small segments during 
> commit|https://github.com/apache/lucene-solr/pull/1552].  That change can 
> substantially reduce the number of small index segments to search.
> In that PR, there was a new proposed interface, {{IndexWriterEvents}}, giving 
> the application a chance to track when {{IndexWriter}} kicked off merges 
> during commit, how many, how long it waited, how often it gave up waiting, 
> etc.
> Such telemetry from production usage is really helpful when tuning settings 
> like which merges (e.g. a size threshold) to attempt on commit, and how long 
> to wait during commit, etc.
> I am splitting out this issue to explore possible approaches to do this.  
> E.g. [~simonw] proposed using a statistics class instead, but if I understood 
> that correctly, I think that would put the role of aggregation inside 
> {{IndexWriter}}, which is not ideal.
> Many interesting events, e.g. how many merges are being requested, how large 
> are they, how long did they take to complete or fail, etc., can be gleaned by 
> wrapping expert Lucene classes like {{MergePolicy}} and {{MergeScheduler}}.  
> But for those events that cannot (e.g. {{IndexWriter}} stopped waiting for 
> merges during commit), it would be very helpful to have some simple way to 
> track so applications can better tune.
> It is also possible to subclass {{IndexWriter}} and override key methods, but 
> I think that is inherently risky as {{IndexWriter}}'s protected methods are 
> not considered to be a stable API, and the synchronization used by 
> {{IndexWriter}} is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to