I think this is worth exploring? Essentially, after each large merge, we'd need to 1) commit, and 2) refresh any open readers (and close the old readers), to fully free up transient disk usage. Maybe we could somehow track the current transient extra disk usage of the index + open readers and once that exceeds a threshold, do something. The "something" could even be asynchronous, e.g. maybe the next merge kicks off, and then asynchronously your app calls commit / refresh? It could be an event/listener API that IW invokes maybe ...
However, the final merge (if merging to a single segment) will necessarily consume up to 2X the index size (1X for the current index + 1X for the newly merged segment); I don't see how to reduce that requirement for the final merge. Mike McCandless http://blog.mikemccandless.com On Fri, Sep 13, 2019 at 12:54 PM Bram Van Dam <[email protected]> wrote: > On 02/09/2019 17:19, Erick Erickson wrote: > > 4> Don’t quite know what to do if maxSegments is 1 (or other very low > number). > > Having maxSegments set to > 5 (or whatever) seems like an acceptable > constraint if it enables optimize without 200% disk usage. > > > Something like this would also pave the way for “background optimizing”. > Instead of a monolithic forceMerge, I can envision a process whereby we > created a low-level task that merged one max-sized segment at a time, came > up for air and reopened searchers then went back in and merged the next > one. With its own problems about coordinating ongoing updates, but that’s > another discussion ;). > > > > There’s lots of details to work out, throwing this out for discussion. I > can raise a JIRA if people think the idea has legs. > > Without having looked at the code, and going only on your assumptions > and my own observations: it sounds like a good idea. The idea of a > background optimizing process is particularly tantalizing. > > AFAICT there hasn't been any other feedback re this? :-/ > > - Bram >
