[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-7700:
--------------------------------
    Attachment: LUCENE-7700.patch

Here's another iteration. It came out quite cleanly I think. In short: I moved 
wrapForMerge to be a method of MergeScheduler. Aborting, pausing and timings on 
OneMerge are now part of a dedicated class (OneMergeProgress) and are entirely 
abstracted away from throughput control. In fact, now only 
ConcurrentMergeScheduler has access to bandwidth control (although it can be 
fairly easily added to any other scheduler).

IndexWriter.addIndexec(Codec) doesn't respect merge scheduler's policies (and 
it wasn't before, so this isn't a breaking change).

The APIs have changed in a few places (didn't do a thorough check yet). Seems 
like a nice cleanup that untangles different concerns to different places.

Everything would have to be triple-checked for correctness. I dropped 
synchronized blocks in a few places where a simple volatile variable was more 
adequate (and very likely much faster).

> Move throughput control and merge aborting out of IndexWriter's core?
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-7700
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7700
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>         Attachments: LUCENE-7700.patch, LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to