Thanks for your replies -- it's good to know we're still planning future work on the hints and advice; once it settles down the common use cases should be clearer, or we can write some docs describing them if we need to.
I remember that thread now! I guess in the back of my mind I thought, oh OK that's settled then, but we didn't actually act on it, so thanks for opening the PR, Chris On Fri, Aug 8, 2025 at 7:03 AM Chris Hegarty <christopher.hega...@elastic.co.invalid> wrote: > > FYI - I opened the following PR to change the default read advice to back to > NORMAL. > > https://github.com/apache/lucene/pull/15040 > > We can continue the discussion there. > > -Chris. > > > On 8 Aug 2025, at 10:03, Chris Hegarty <christopher.hega...@elastic.co> > > wrote: > > > > Hi, > > > > There are two related but orthogonal parts to this: > > > > 1. The refactoring to IOContext and hints, that Simon has described. > > 2. The default advice that Lucene should use out-of-the-box. > > > > I believe that we are in good shape to completing no.1. For no.2, we > > discussed this in the following issue > > https://github.com/apache/lucene/issues/14408 - the conclusion is that we > > revert the default back to NORMAL. > > > > With this, then Lucene does not set MADV_RANDOM, unless the user opts-in - > > which is greatly improved by no.1. > > > > -Chris. > > > >> On 8 Aug 2025, at 09:40, Simon Cooper <simon.coo...@elastic.co.INVALID> > >> wrote: > >> > >> As I've been working in this area, here's my 2c... > >> > >> The move from ReadAdvice to IOContext hints is as yet unfinished, > >> https://github.com/apache/lucene/pull/14977 and > >> https://github.com/apache/lucene/pull/14844 will finish it off. Once those > >> are merged, ReadAdvice will only be used as an implementation detail of > >> MMapDirectory and related classes, core Lucene classes will only deal with > >> IOContext and hints. By subclassing MMapDirectory, you can modify the > >> hints that are passed down to the base implementation as you need to, > >> and/or specify your own hints or IOContext implementations to help refine > >> the behaviour you need. > >> > >> It will then be up to each directory implementation to look at the hints > >> specified, and use those to inform how it should open the files. At the > >> moment, MMapDirectory is the only one which does this, and it does this > >> using different ReadAdvices based on the hints. Exactly which ReadAdvice > >> is used for a particular combination of hints can be modified. I'm also > >> not sure where NORMAL or RANDOM is best used, but I've tried to keep > >> current behaviour unchanged as much as possible so far. > >> > >> SimonC > >> > >> On Thu, 7 Aug 2025 at 22:03, Michael Sokolov <soko...@falutin.net.invalid> > >> wrote: > >> I want to raise an issue here that has come up before which is about the > >> choices we have made to apply madvise flags in an opinionated way. > >> > >> In our environment, the choices Lucene is making are really detrimental to > >> our indexing throughput. In the past we had disabled this by subclassing > >> MMapDirectory (a super expert workaround). Somehow we missed the fact that > >> changes in Lucene 10 made this workaround ineffective and it took us a > >> while to find the new recommended workaround, which is a system property > >> setting. In an excess (perhaps) of caution, instead of the sysprop we've > >> opted to modify a Lucene fork to disable this in a more fundamental way > >> (cauterizing PosixNativeAccess.madvise), I think hoping that this might > >> insulate us against future changes in this area? But we don't want to have > >> to engage in this kind of paranoid programming! > >> > >> Lucene has made a choice that may be good for some environments or > >> operating conditions, but not for others, and the difference can be pretty > >> dramatic. I'm not sure how we came to decide that the current default is > >> better than the old one? I'll also say I don't really understand why the > >> MADV_RANDOM is hurting us so much, but it does cause our merge operations > >> to get much slower, fall behind, and pile up to the extent that > >> low-resource environments (that used to work fine with MADV_NORMAL) are > >> crumbling under the weight of pending merges. > >> > >> Another thread is that the multiple layers of abstraction we have today > >> (IOContext + ReadAdvice + DataAccessHint + FileDataHint + madvise) make it > >> quite difficult to reason about what OS behavior is happening for any > >> given IO operation. I read the IOContext javadocs but they only give > >> general information and don't explain how hints are used to determine an > >> actual MADV flag. In what circumstance should I use a hint vs an advice? > >> The IndexInput.updateReadAdvice javadoc actually says "provide a hint" but > >> accepts an advice. > >> > >> So to summarize: > >> > >> • Selflishly, I don't like the current default MADV setting Lucene has > >> chosen, although I recognize it's possible it may work for some use case. > >> But I do wonder at some level if the OS's default shouldn't be a good > >> default setting? > >> • I find the Lucene API in this area confusing and not well-documented. > >> Understanding that the IO contexts are many and varied and could > >> profitably be tuned differently, I wonder if we could have a centralized > >> and first-class API (not a system property) that can be used to set a > >> memory access profile of some sort? > >> > >> I think some evidence supporting the choices we have made today (why is > >> the default MADV_RANDOM) would be helpful as a starting point. Maybe there > >> is a past thread I overlooked? > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org