[ 
https://issues.apache.org/jira/browse/LUCENE-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875883#comment-15875883
 ] 

Michael McCandless commented on LUCENE-7700:
--------------------------------------------

Thank you [~dawid.weiss] for giving this some attention ... this intertwining 
is horribly messy today!  Your patch is a nice step forward.

One difference with your patch is we would now wrap the {{Directory}} for merge 
on every merge, instead of once up front, but that's fine (the cost is tiny vs. 
cost of the merge), and, possibly powerful, since each merge can now decide 
what to do about IO throttling / Directory wrapping.

And it's nice that we can remove IW's ThreadLocal tracking the rate limiters.

bq. // TODO: no throughput control after changes; should we comply with merge 
scheduler/ policy here?

I do think this it's important that the IO throttling applies when building the 
CFS file?  For a large merge, this is a big burst of IO in the end ... too bad 
we can't use an API like Linux's {{splice}} to efficiently copy bytes (though 
we'd likely still want throttling there too anyway...).


> Move throughput control and merge aborting out of IndexWriter's core?
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-7700
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7700
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>         Attachments: LUCENE-7700.patch
>
>
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom 
> i/o flow control (global),
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, 
> OneMerge, IndexWriter.
> Looking at the code it seems to me that everything with respect to I/O 
> control could be nicely pulled out into classes that explicitly control the 
> merging process, that is only MergePolicy and MergeScheduler. By default, one 
> could even run without any additional I/O accounting overhead (which is 
> currently in there, even if one doesn't use the CMS's throughput control).
> Such refactoring would also give a chance to nicely move things where they 
> belong -- job aborting into OneMerge (currently in RateLimiter), rate limiter 
> lifecycle bound to OneMerge (MergeScheduler could then use per-merge or 
> global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to