[
https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932864#action_12932864
]
Earwin Burrfoot commented on LUCENE-2755:
-----------------------------------------
{quote}
If we proceed w/ your proposal, that is basically the MS/ME polling MP, and not
IW doing so, how would IW know about the running merges and pending ones? Today
IW tracks those two lists so that if you need to abort merges, it knows which
ones to abort.
We can workaround aborting the running merges by introducing a MS.abort()-like
method. But what about MP? Now the lists are divided between too entities (MP
and MS), and aborting a MP does not make sense (doable, but I don't think it
belongs there).
{quote}
There are no lists at all with my approach. At least no "pending" list, that
one gets recalculated each time we poll MP and it never gets out, neither gets
stored inside.
There's a kind of implicit "in flight" list - MS has the knowledge of its
threads that are currently doing things. And if you want to go around aborting
things, MS is probably the right place to do this.
bq. Maybe we can have MS.abort() poll MP for next merges until it returns null,
and throwing all the returned ones away - that can be done.
So, just I said - that's not needed. MP is empty, it has no state.
bq. Should we, in the scope of this issue, make IW a required settable
parameter on MS, like we do w/ MP?
For the love of God, no. I'd like to see it removed from MP too.
It's only natural to pass the same instance of Policy or Scheduler to different
Writers, so they have the same behaviour and share Scheduler resources
(insanely important if you have fifteen indexes like I do and don't want them
to rape hardware with fifteen simultaneous merges).
It is against the nature to pass Writer to Policy. Does the Policy need to
write anything on its own, when it decides to? No. It should advice, not act.
> Some improvements to CMS
> ------------------------
>
> Key: LUCENE-2755
> URL: https://issues.apache.org/jira/browse/LUCENE-2755
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Shai Erera
> Assignee: Shai Erera
> Priority: Minor
> Fix For: 3.1, 4.0
>
>
> While running optimize on a large index, I've noticed several things that got
> me to read CMS code more carefully, and find these issues:
> * CMS may hold onto a merge if maxMergeCount is hit. That results in the
> MergeThreads taking merges from the IndexWriter until they are exhausted, and
> only then that blocked merge will run. I think it's unnecessary that that
> merge will be blocked.
> * CMS sorts merges by segments size, doc-based and not bytes-based. Since the
> default MP is LogByteSizeMP, and I hardly believe people care about doc-based
> size segments anymore, I think we should switch the default impl. There are
> two ways to make it extensible, if we want:
> ** Have an overridable member/method in CMS that you can extend and override
> - easy.
> ** Have OneMerge be comparable and let the MP determine the order (e.g. by
> bytes, docs, calibrate deletes etc.). Better, but will need to tap into
> several places in the code, so more risky and complicated.
> On the go, I'd like to add some documentation to CMS - it's not very easy to
> read and follow.
> I'll work on a patch.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]