We are faced with a similar situation. Yes, the merge process can take
a long time and is mostly single-threaded (if you're merging from N
segments into a single segment, only one thread does the job). As
Erick pointed out, the merge process takes a backseat compared to
indexing and searches (in most cases), so it's not a priority, but
it's definitely something people like you (and me) could utilize, if
given the opportunity.

I actually don't see any reasons why merging of individual parts of a
segment can't be done in parallel (this would be a start; later on a
splittable strategy of merging single things could make use of things
like the fork-join executor). I'd love to work on this at some point,
but I honestly don't see any time soon this could be happening. If you
have a spare cycle, take a look at how index writer merges a single
segment; there are quite trivial ways this could be split into
parallel subtasks and executed with, for example, the system fork-join
executor (even without forkable tasks).

https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L2999-L3007

As a side note, you may want to make absolutely sure your merge
scheduler (if it's the CMS) is not using any I/O throttling -- this is
theoretically self-adjustable, but in practice if you only care about
the wall-clock end of a single merge, it's better to turn it off.

Dawid
On Fri, Nov 2, 2018 at 5:57 PM Erick Erickson <erickerick...@gmail.com> wrote:
>
> The merge process is rather tricky, and there's nothing that I know of
> that will use all resources available. In fact the merge code is
> written to _not_ use up all the possible resources on the theory that
> there should be some left over to handle queries etc.
>
> Yeah, the situation you describe is indeed one of the few where
> merging down to 1 segment makes sense. Out of curiosity, what kind of
> performance gains to you see?
>
> This applies to the default TieredMergePolicy (TMP):
>
> 1> there is a limit to the number of segments that can be merged at
> once, so sometimes it can take more than one pass. If you have more
> than 30 segments, it'll be multi-pass. You can try (and I haven't done
> this personally) setting maxMergeAtOnceExplicit in your solrconfig.xml
> to see if it helps. That only takes effect when you forceMerge.
> There's a trick bit of reflection that handles this, see the very end
> of TieredMergePolicy.java for the parameters you can set.
>
> 2> As of Solr 7.5 (see LUCENE-7976) the default behavior has changed
> from automatically merging down to 1 segment to respecting
> "maxMergedSegmentMB" (default 5G). You will have to explicitly pass
> maxSegments=1 to get the old behavior.
>
> Best,
> Erick
> On Fri, Nov 2, 2018 at 3:13 AM Jerven Bolleman
> <jerven.bolleman@sib.swiss> wrote:
> >
> > Dear Lucene Devs and Users,
> >
> > First of all thank you for this wonderful library and API.
> >
> > forceMerges are normally not recommended but we fall into one of the few
> > usecases where it makes sense.
> >
> > In our use case we have a large index (3 actually) and we don't update
> > them ever after indexing. i.e. we index all the documents and then never
> > ever add another document to the index, nor are any deleted.
> >
> > It has proven beneficial for search performance to always foreMerge down
> > to one segment. However, this takes significant time. Are there any
> > suggestions on what kind of merge scheduler/policy settings will utilize
> > the most of the available IO, CPU and RAM capacity? Currently we end up
> > being single thread bound, leaving lots of potential cpu and bandwidth
> > not used during the merge.
> >
> > e.g. we are looking for a MergeEvertyThing use all hardware policy and
> > scheduler.
> >
> > We are currently on lucene 7.4 but nothing is stopping us from upgrading.
> >
> > Regards,
> > Jerven
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to