We are faced with a similar situation. Yes, the merge process can take a long time and is mostly single-threaded (if you're merging from N segments into a single segment, only one thread does the job). As Erick pointed out, the merge process takes a backseat compared to indexing and searches (in most cases), so it's not a priority, but it's definitely something people like you (and me) could utilize, if given the opportunity.
I actually don't see any reasons why merging of individual parts of a segment can't be done in parallel (this would be a start; later on a splittable strategy of merging single things could make use of things like the fork-join executor). I'd love to work on this at some point, but I honestly don't see any time soon this could be happening. If you have a spare cycle, take a look at how index writer merges a single segment; there are quite trivial ways this could be split into parallel subtasks and executed with, for example, the system fork-join executor (even without forkable tasks). https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L2999-L3007 As a side note, you may want to make absolutely sure your merge scheduler (if it's the CMS) is not using any I/O throttling -- this is theoretically self-adjustable, but in practice if you only care about the wall-clock end of a single merge, it's better to turn it off. Dawid On Fri, Nov 2, 2018 at 5:57 PM Erick Erickson <erickerick...@gmail.com> wrote: > > The merge process is rather tricky, and there's nothing that I know of > that will use all resources available. In fact the merge code is > written to _not_ use up all the possible resources on the theory that > there should be some left over to handle queries etc. > > Yeah, the situation you describe is indeed one of the few where > merging down to 1 segment makes sense. Out of curiosity, what kind of > performance gains to you see? > > This applies to the default TieredMergePolicy (TMP): > > 1> there is a limit to the number of segments that can be merged at > once, so sometimes it can take more than one pass. If you have more > than 30 segments, it'll be multi-pass. You can try (and I haven't done > this personally) setting maxMergeAtOnceExplicit in your solrconfig.xml > to see if it helps. That only takes effect when you forceMerge. > There's a trick bit of reflection that handles this, see the very end > of TieredMergePolicy.java for the parameters you can set. > > 2> As of Solr 7.5 (see LUCENE-7976) the default behavior has changed > from automatically merging down to 1 segment to respecting > "maxMergedSegmentMB" (default 5G). You will have to explicitly pass > maxSegments=1 to get the old behavior. > > Best, > Erick > On Fri, Nov 2, 2018 at 3:13 AM Jerven Bolleman > <jerven.bolleman@sib.swiss> wrote: > > > > Dear Lucene Devs and Users, > > > > First of all thank you for this wonderful library and API. > > > > forceMerges are normally not recommended but we fall into one of the few > > usecases where it makes sense. > > > > In our use case we have a large index (3 actually) and we don't update > > them ever after indexing. i.e. we index all the documents and then never > > ever add another document to the index, nor are any deleted. > > > > It has proven beneficial for search performance to always foreMerge down > > to one segment. However, this takes significant time. Are there any > > suggestions on what kind of merge scheduler/policy settings will utilize > > the most of the available IO, CPU and RAM capacity? Currently we end up > > being single thread bound, leaving lots of potential cpu and bandwidth > > not used during the merge. > > > > e.g. we are looking for a MergeEvertyThing use all hardware policy and > > scheduler. > > > > We are currently on lucene 7.4 but nothing is stopping us from upgrading. > > > > Regards, > > Jerven > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org