Thanks for your replies -- it's good to know we're still planning
future work on the hints and advice; once it settles down the common
use cases should be clearer, or we can write some docs describing them
if we need to.

I remember that thread now! I guess in the back of my mind I thought,
oh OK that's settled then, but we didn't actually act on it, so thanks
for opening the PR, Chris

On Fri, Aug 8, 2025 at 7:03 AM Chris Hegarty
<christopher.hega...@elastic.co.invalid> wrote:
>
> FYI - I opened the following PR to change the default read advice to back to 
> NORMAL.
>
> https://github.com/apache/lucene/pull/15040
>
> We can continue the discussion there.
>
> -Chris.
>
> > On 8 Aug 2025, at 10:03, Chris Hegarty <christopher.hega...@elastic.co> 
> > wrote:
> >
> > Hi,
> >
> > There are two related but orthogonal parts to this:
> >
> > 1. The refactoring to IOContext and hints, that Simon has described.
> > 2. The default advice that Lucene should use out-of-the-box.
> >
> > I believe that we are in good shape to completing no.1. For no.2, we 
> > discussed this in the following issue 
> > https://github.com/apache/lucene/issues/14408 - the conclusion is that we 
> > revert the default back to NORMAL.
> >
> > With this, then Lucene does not set MADV_RANDOM, unless the user opts-in - 
> > which is greatly improved by no.1.
> >
> > -Chris.
> >
> >> On 8 Aug 2025, at 09:40, Simon Cooper <simon.coo...@elastic.co.INVALID> 
> >> wrote:
> >>
> >> As I've been working in this area, here's my 2c...
> >>
> >> The move from ReadAdvice to IOContext hints is as yet unfinished, 
> >> https://github.com/apache/lucene/pull/14977 and 
> >> https://github.com/apache/lucene/pull/14844 will finish it off. Once those 
> >> are merged, ReadAdvice will only be used as an implementation detail of 
> >> MMapDirectory and related classes, core Lucene classes will only deal with 
> >> IOContext and hints. By subclassing MMapDirectory, you can modify the 
> >> hints that are passed down to the base implementation as you need to, 
> >> and/or specify your own hints or IOContext implementations to help refine 
> >> the behaviour you need.
> >>
> >> It will then be up to each directory implementation to look at the hints 
> >> specified, and use those to inform how it should open the files. At the 
> >> moment, MMapDirectory is the only one which does this, and it does this 
> >> using different ReadAdvices based on the hints. Exactly which ReadAdvice 
> >> is used for a particular combination of hints can be modified. I'm also 
> >> not sure where NORMAL or RANDOM is best used, but I've tried to keep 
> >> current behaviour unchanged as much as possible so far.
> >>
> >> SimonC
> >>
> >> On Thu, 7 Aug 2025 at 22:03, Michael Sokolov <soko...@falutin.net.invalid> 
> >> wrote:
> >> I want to raise an issue here that has come up before which is about the 
> >> choices we have made to apply madvise flags in an opinionated way.
> >>
> >> In our environment, the choices Lucene is making are really detrimental to 
> >> our indexing throughput. In the past we had disabled this by subclassing 
> >> MMapDirectory (a super expert workaround). Somehow we missed the fact that 
> >> changes in Lucene 10 made this workaround ineffective and it took us a 
> >> while to find the new recommended workaround, which is a system property 
> >> setting.  In an excess (perhaps) of caution, instead of the sysprop we've 
> >> opted to modify a Lucene fork to disable this in a more fundamental way 
> >> (cauterizing PosixNativeAccess.madvise), I think hoping that this might 
> >> insulate us against future changes in this area? But we don't want to have 
> >> to engage in this kind of paranoid programming!
> >>
> >> Lucene has made a choice that may be good for some environments or 
> >> operating conditions, but not for others, and the difference can be pretty 
> >> dramatic. I'm not sure how we came to decide that the current default is 
> >> better than the old one?  I'll also say I don't really understand why the 
> >> MADV_RANDOM is hurting us so much, but it does cause our merge operations 
> >> to get much slower, fall behind, and pile up to the extent that 
> >> low-resource environments (that used to work fine with MADV_NORMAL) are 
> >> crumbling under the weight of pending merges.
> >>
> >> Another thread is that the multiple layers of abstraction we have today 
> >> (IOContext + ReadAdvice + DataAccessHint + FileDataHint + madvise) make it 
> >> quite difficult to reason about what OS behavior is happening for any 
> >> given IO operation. I read the IOContext javadocs but they only give 
> >> general information and don't explain how hints are used to determine an 
> >> actual MADV flag.  In what circumstance should I use a hint vs an advice? 
> >> The IndexInput.updateReadAdvice javadoc actually says "provide a hint" but 
> >> accepts an advice.
> >>
> >> So to summarize:
> >>
> >>    • Selflishly, I don't like the current default MADV setting Lucene has 
> >> chosen, although I recognize it's possible it may work for some use case.  
> >> But I do wonder at some level if the OS's default shouldn't be a good 
> >> default setting?
> >>    • I find the Lucene API in this area confusing and not well-documented. 
> >>  Understanding that the IO contexts are many and varied and could 
> >> profitably be tuned differently, I wonder if we could have a centralized 
> >> and first-class API (not a system property) that can be used to set a 
> >> memory access profile of some sort?
> >>
> >> I think some evidence supporting the choices we have made today (why is 
> >> the default MADV_RANDOM) would be helpful as a starting point. Maybe there 
> >> is a past thread I overlooked?
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to