IIUC, segments are actually written the first time when the ramBufferSizeMB is exceeded. If you can afford it you might increase that number. NOTE: I'm going from memory here so you should check....
That doesn't really address merging segments with deleted docs though. I do wonder what happens if you bump the segments per tier. My guess: less frequent but more intense merges so what the overall effect is is unclear. Best, Erick On Tue, Aug 1, 2017 at 8:00 AM, Walter Underwood <[email protected]> wrote: > Optimizing for frequent changes sounds like a caching strategy, maybe “LRU > merging”. Perhaps prefer merging segments that have not changed in a while? > > wunder > Walter Underwood > [email protected] > http://observer.wunderwood.org/ (my blog) > > > On Aug 1, 2017, at 5:50 AM, Tommaso Teofili <[email protected]> > wrote: > > > > Il giorno mar 1 ago 2017 alle ore 14:04 Adrien Grand <[email protected]> ha > scritto: >> >> The trade-off does not sound simple to me. This approach could lead to >> having more segments overall, making search requests and updates potentially >> slower and more I/O-intensive since they have to iterate over more segments? >> I'm not saying this is a bad idea, but it could have unexpected >> side-effects. > > > yes, that's my same concern. > >> >> >> Do you actually have a high commit rate or a high reopen rate >> (DirectoryReader.open(IndexWriter))? > > > in my scenario both, but commit rate is much superseding reopening. > >> >> Maybe reopening instead of committing (and still committing, but less >> frequently) would decrease the I/O load since NRT segments might never need >> to be actually written to disk if they are merged before the next commit >> happens and you give enough memory to the filesystem cache. > > > makes sense in general, however I am a bit constrained in how much I can > avoid committing (states from an MVCC systems are tight to commits, so it's > trickier). > > In general I was wondering if we could have the merge policy look at both > commit rate and no. of segments and decide whether to merge or not based on > both, so that if the segments growth is within a threshold we possibly save > some merges when we have high commit rates, but as you say we may have to do > bigger merges then. > I can imagine this to make more sense when a lot of tiny changes are made to > the index rather than a few big ones (then the bigger merges problem should > be less significant). > > Other than my specific scenario, I am thinking that we can look again at the > current MP algorithm and see if we can improve it, or make it more flexible > to the way the "sneaky opponent" (Mike's ™ [1]) behaves. > > My 2 cents, > Tommaso > > [1] : > http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html > >> >> >> Le mar. 1 août 2017 à 10:59, Tommaso Teofili <[email protected]> a >> écrit : >>> >>> Hi all, >>> >>> lately I am looking a bit closer at merge policies, of course >>> particularly at the tiered one, and I was wondering if we can mitigate the >>> amount of possibly avoidable merges in high commit rates scenarios, >>> especially when a high percentage of the commits happens on same docs. >>> I've observed several evolutions of merges in such scenarios and it >>> seemed to me the merge policy was too aggressive in merging, causing a large >>> IO overhead. >>> I've then tried the same with a merge policy which was tentatively >>> looking at commit rates and skipping merges if such a rate is higher than a >>> threshold which seemed to give slightly better results in reducing the >>> unneeded IO caused by avoidable merges. >>> >>> I know this is a bit abstract but I would like to know if anyone has any >>> ideas or plans about mitigating the merge overhead in general and / or in >>> similar cases. >>> >>> Regards, >>> Tommaso >>> >>> >>> > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
