Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

Navneet Verma Wed, 16 Oct 2024 01:00:29 -0700

Hi Uwe,
Thanks for the prompt response. I have created the gh issue:
https://github.com/apache/lucene/issues/13920 for more discussion. We can
move all discussions to the gh issues.


Thanks
Navneet

On Tue, Oct 15, 2024 at 3:17 AM Uwe Schindler <u...@thetaphi.de> wrote:

> Hi,
>
> The problem with your aproach is that you can change the madvise on a
> clone, but as the underlying memory is the same for the cloned index
> input, it won't revert back to RANDOM.
>
> Basically there's no need to clone or create a slice. We should better
> allow to change the advise for an IndexInput and restore it later. We
> have that functionality in Lucene's 10.x version already, it can create
> slices.
>
> The linked diff is too intrusive, we won't accept this as a PR, because
> it does not use the madvise call in correcty ways and changes semantics
> of preloading. Please open an issue instead for discussion.
>
> Uwe
>
> Am 15.10.2024 um 09:06 schrieb Navneet Verma:
> > Hi Uwe,
> >
> > *>> thinking about it a bit more: In 10.x we already have some ways to
> > **preload
> > data with WILL_NEED (or similar). Maybe this can also be used on
> **merging
> > when we reuse an already open IndexInput. Maybe it is possible **to
> change
> > the madvise on an already open IndexInput and change it **before merging
> > (and revert back). This would improve merging and would **not affect.*
> >
> > Tejas, my teammate and I tried a similar approach mentioned by you above
> > where he made the changes to ensure that during merge we change the
> madvise
> > from random to sequential on a cloned IndexInput.
> > <
> https://github.com/shatejas/lucene/commit/4de387288d70b4d8aede45ef3095ae6c1e189331#diff-e0a29611df21f6d32a461e2d24db1585cdf3a8590f08d93b097f0dd84684ebc8R316
> >
> > We
> > saw that the merge time was reduced *from > 10mins to < 1min with 1.6M
> > 768D. *This was done on top of the 9.11.0 version of Lucene. We are
> > inclined to use this approach.
> >
> >
> > *>> I know there are some problems if the IndexInput is used for multiple
> > things like reading, merging and/or checksumming at same time. Some code
> > tries to reuse already opened index inputs also for merging. But for this
> > case I think it might be better to open a separate IndexInput and not
> clone
> > an existing one for checksumming?*
> >
> > In the previous emails you suggested that we can also open up a new
> > IndexInput which can then be used for checksumming during merges and as I
> > have mentioned earlier doing this gave the similar results. But on doing
> > further deep-dives I found out that it is not recommended to create
> > multiple instances of IndexInput in different threads(ref
> > <
> https://github.com/apache/lucene/blob/350de210c3674566293681bb58e801629b5ceee7/lucene/core/src/java/org/apache/lucene/store/IndexInput.java#L22-L39
> >).
> > So I wanted to understand if this still holds true? As we didn't find any
> > case as of now where opening multiple IndexInput caused a problem, even
> > when searches are happening during indexing/merges. Please let us know
> your
> > thoughts here.
> >
> > Thanks
> > Navneet
> >
> >
> > On Tue, Oct 1, 2024 at 2:55 AM Uwe Schindler <u...@thetaphi.de> wrote:
> >
> >> Hi,
> >>
> >> thinking about it a bit more: In 10.x we already have some ways to
> >> preload data with WILL_NEED (or similar). Maybe this can also be used on
> >> merging when we reuse an already open IndexInput. Maybe it is possible
> >> to chanhge the madvise on an already open IndexInput and change it
> >> before merging (and revert back). This would improve merging and would
> >> not affect.
> >>
> >> So I advise to not do any adhoc changes breaking the random read code
> >> for vectors and docvalues again and think about better ideas. In 10.x we
> >> have done a lot of thoughts, but the "upgrade an IndexInput for merging
> >> or checksumming" could be a nice addition - of course revert it back to
> >> original IOContext with some try/finally after the work is done. This
> >> would play much nicer with our "reuse IndexInput of NRT readers while
> >> merging".
> >>
> >> Adrien, do you have any ideas?
> >>
> >> Uwe
> >>
> >> Am 01.10.2024 um 10:17 schrieb Uwe Schindler:
> >>> Hi,
> >>>
> >>> great.
> >>>
> >>> I still think the difference between RANDOM and READ is huge in your
> >>> case. Are you sure that you have not misconfigured your system. The
> >>> most important thing for Lucene is to make sure that heap space of the
> >>> Java VM is limited as much as possible (shortly over the OOM boundary)
> >>> and the free available RAM space is a large as possible to allow
> >>> MMapDirectory to use off-heap in an ideal way and minimize paging
> >>> overhead. If you don't do this, the kernel will be under much higher
> >>> pressure.
> >>>
> >>> In general, the correct fix for this is to use RANDOM for normal
> >>> reading of index and use the other IOContexts only for merging. If tis
> >>> requires files to be opened multiple times its a better compromise.
> >>>
> >>> Please note: we are focusing on 10.x, so please supply PRs/changes for
> >>> Lucene main branch only, backports will be done automatically. We
> >>> won't change the IOContexts in 9.x anymore.
> >>>
> >>> Uwe
> >>>
> >>> Am 01.10.2024 um 10:04 schrieb Navneet Verma:
> >>>> Hi Uwe,
> >>>> Thanks for sharing the link and providing the useful information. I
> will
> >>>> definitely go ahead and create a gh issue. In the meantime I did some
> >>>> testing by changing the IOContext from RANDOM to READ for FlatVectors
> >>>> <
> >>
> https://github.com/navneet1v/lucene/commit/cd02e6f39acea82f7e56b36d8fd44156b4e271f9
> >
> >>
> >>>> and what I can see is the overall merge + integrity checks have
> already
> >>>> come down *from > 10mins to < 1min with 1.6M 768D vectors.* This
> further
> >>>> confirms that RANDOM is not good for checksums. I didn't validate how
> >>>> much
> >>>> HNSW latencies will change if we move to READ. I think there would be
> >>>> some
> >>>> latency degradation. We can discuss further around the solutions on
> the
> >>>> github issue.
> >>>>
> >>>> I will post all these in a github issue and will try to raise a PR
> >>>> with a
> >>>> fix. Will be looking forward to your feedback.
> >>>>
> >>>> Thanks
> >>>> Navneet
> >>>>
> >>>>
> >>>> On Tue, Oct 1, 2024 at 12:19 AM Uwe Schindler <u...@thetaphi.de>
> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> this seems to be aspecial case in FlatVectors, because normally
> >>>>> theres a
> >>>>> separate method to open an IndexInput for checksumming:
> >>>>>
> >>>>>
> >>>>>
> >>
> https://github.com/apache/lucene/blob/524ea208c870861a719f21b1ea48943c8b7520da/lucene/core/src/java/org/apache/lucene/store/Directory.java#L155-L157
> >>>>>
> >>>>> Could you open an issue, it looks like it is not always used? I know
> >>>>> there are some problems if the IndexInput is used for multiple things
> >>>>> like reading, merging and/or checksumming at same time. Some code
> tries
> >>>>> to reuse already opened index inputs also for merging. But for this
> >>>>> case
> >>>>> I think it might be better to open a separate IndexInput and not
> clone
> >>>>> an existing one for checksumming?
> >>>>>
> >>>>> The first link should of course open the IndexInput with RANDOM,
> >>>>> because
> >>>>> during normal reading of vectors this is a must. Generally although
> the
> >>>>> checksumming is slower it should not be a big issue, because it won't
> >>>>> affect searches, only merging of segments. And there the throughput
> >>>>> should be high, but not top priority.
> >>>>>
> >>>>> Uwe
> >>>>>
> >>>>> Am 01.10.2024 um 04:52 schrieb Navneet Verma:
> >>>>>> Hi Uwe and Mike,
> >>>>>> Thanks for providing such a quick response. Let me try to ans few
> >>>>>> things
> >>>>>> here:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> *In addition, inLucene 9.12 (latest 9.x) version released today
> >>>>>> there are
> >>>>>> some changesto ensure that checksumming is always done with
> >>>>>> IOContext.READ_ONCE(which uses READ behind the scenes).*
> >>>>>> I didn't find any such change for FlatVectorReaders
> >>>>>> <
> >>
> https://github.com/apache/lucene/blob/branch_9_12/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java#L68-L77
> >>>>>> ,
> >>>>>> even though I checked the BufferedChecksumInput
> >>>>>> <
> >>
> https://github.com/apache/lucene/blob/branch_9_12/lucene/core/src/java/org/apache/lucene/store/BufferedChecksumIndexInput.java#L31-L35
> >>>>>> and CheckedSumInput
> >>>>>> <
> >>
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/store/ChecksumIndexInput.java#L25
> >>>>>> ,
> >>>>>> CodecUtil
> >>>>>> <
> >>
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/CodecUtil.java#L606-L621
> >>>>>> in 9.12 version. Please point me to the right file if I am missing
> >>>>>> something here. I can see the same for lucene version 10
> >>>>>> <
> >>
> https://github.com/apache/lucene/blob/branch_10_0/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java#L69-L77
> >>>>>> too.
> >>>>>>
> >>>>>> Mike on the question of what is RANDOM vs READ context doing we
> found
> >>>>> this
> >>>>>> information related to MADV online.
> >>>>>>
> >>>>>> MADV_RANDOM Expect page references in random order. (Hence, read
> ahead
> >>>>> may
> >>>>>> be less useful than normally.)
> >>>>>> MADV_SEQUENTIAL Expect page references in sequential order. (Hence,
> >>>>>> pages
> >>>>>> in the given range can be aggressively read ahead, and may be freed
> >>>>>> soon
> >>>>>> after they are accessed.)
> >>>>>> MADV_WILLNEED Expect access in the near future. (Hence, it might be
> a
> >>>>> good
> >>>>>> idea to read some pages ahead.)
> >>>>>>
> >>>>>> This tells me that MADV_RANDOM random for checksum is not good as
> >>>>>> it will
> >>>>>> consume more read cycles given the sequential nature of the
> checksum.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> *One simple workaround an application can do is to ask MMapDirectory
> >>>>>> topre-touch all bytes/pages in .vec/.veq files -- this asks the OS
> to
> >>>>>> cacheall of those bytes into page cache (if there is enough free
> RAM).
> >>>>> We
> >>>>>> dothis at Amazon (product search) for our production searching
> >>>>>> processes.Otherwise paging in all .vec/.veq pages via random access
> >>>>>> provoked throughHNSW graph searching is crazy slow...*
> >>>>>> Did you mean the preload functionality offered by MMapDirectory
> >>>>>> here? I
> >>>>> can
> >>>>>> try this to see if that helps. But I doubt that in this case.
> >>>>>>
> >>>>>> On opening the issue, I am working through some reproducible
> >>>>>> benchmarks
> >>>>>> before creating a gh issue. If you believe I should create a GH
> issue
> >>>>> first
> >>>>>> I can do that. As it might take me sometime to build reproducible
> >>>>>> benchmarks.
> >>>>>>
> >>>>>> Thanks
> >>>>>> Navneet
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Sep 30, 2024 at 3:08 AM Uwe Schindler <u...@thetaphi.de>
> >> wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> please also note: In Lucene 10 there checksum IndexInput will
> >>>>>>> always be
> >>>>>>> opened with IOContext.READ_ONCE.
> >>>>>>>
> >>>>>>> If you want to sequentially read a whole index file for other
> reason
> >>>>>>> than checksumming, please pass the correct IOContext. In addition,
> in
> >>>>>>> Lucene 9.12 (latest 9.x) version released today there are some
> >>>>>>> changes
> >>>>>>> to ensure that checksumming is always done with IOContext.READ_ONCE
> >>>>>>> (which uses READ behind scenes).
> >>>>>>>
> >>>>>>> Uwe
> >>>>>>>
> >>>>>>> Am 29.09.2024 um 17:09 schrieb Michael McCandless:
> >>>>>>>> Hi Navneet,
> >>>>>>>>
> >>>>>>>> With RANDOM IOcontext, on modern OS's / Java versions, Lucene
> >>>>>>>> will hint
> >>>>>> the
> >>>>>>>> memory mapped segment that the IO will be random using madvise
> POSIX
> >>>>> API
> >>>>>>>> with MADV_RANDOM flag.
> >>>>>>>>
> >>>>>>>> For READ IOContext, Lucene maybe hits with MADV_SEQUENTIAL, I'm
> not
> >>>>>> sure.
> >>>>>>>> Or maybe it doesn't hint anything?
> >>>>>>>>
> >>>>>>>> It's up to the OS to then take these hints and do something
> >>>>>> "interesting"
> >>>>>>>> to try to optimize IO and page caching based on these hints.  I
> >>>>>>>> think
> >>>>>>>> modern Linux OSs will readahead (and pre-warm page cache) for
> >>>>>>>> MADV_SEQUENTIAL?  And maybe skip page cache and readhead for
> >>>>>> MADV_RANDOM?
> >>>>>>>> Not certain...
> >>>>>>>>
> >>>>>>>> For computing checksum, which is always a sequential operation,
> >>>>>>>> if we
> >>>>>> use
> >>>>>>>> MADV_RANDOM (which is stupid), that is indeed expected to perform
> >>>>>>>> worse
> >>>>>>>> since there is no readahead pre-caching.  50% worse (what you are
> >>>>>> seeing)
> >>>>>>>> is indeed quite an impact ...
> >>>>>>>>
> >>>>>>>> Maybe open an issue?  At least for checksumming we should open
> even
> >>>>> .vec
> >>>>>>>> files for sequential read?  But, then, if it's the same IndexInput
> >>>>> which
> >>>>>>>> will then be used "normally" (e.g. for merging), we would want
> >>>>>>>> THAT one
> >>>>>> to
> >>>>>>>> be open for random access ... might be tricky to fix.
> >>>>>>>>
> >>>>>>>> One simple workaround an application can do is to ask
> >>>>>>>> MMapDirectory to
> >>>>>>>> pre-touch all bytes/pages in .vec/.veq files -- this asks the OS
> to
> >>>>>> cache
> >>>>>>>> all of those bytes into page cache (if there is enough free
> >>>>>>>> RAM).  We
> >>>>> do
> >>>>>>>> this at Amazon (product search) for our production searching
> >>>>>>>> processes.
> >>>>>>>> Otherwise paging in all .vec/.veq pages via random access provoked
> >>>>>> through
> >>>>>>>> HNSW graph searching is crazy slow...
> >>>>>>>>
> >>>>>>>> Mike McCandless
> >>>>>>>>
> >>>>>>>> http://blog.mikemccandless.com
> >>>>>>>>
> >>>>>>>> On Sun, Sep 29, 2024 at 4:06 AM Navneet Verma <
> >>>>> vermanavneet...@gmail.com
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Lucene Experts,
> >>>>>>>>> I wanted to understand the performance difference between
> >>>>>>>>> opening and
> >>>>>>>>> reading the whole file using an IndexInput with IoContext as
> >>>>>>>>> RANDOM vs
> >>>>>>>>> READ.
> >>>>>>>>>
> >>>>>>>>> I can see .vec files(storing the flat vectors) are opened with
> >>>>>>>>> RANDOM
> >>>>>> and
> >>>>>>>>> whereas dvd files are opened as READ. As per my testing with
> files
> >>>>>> close to
> >>>>>>>>> size 5GB storing (~1.6M docs with each doc 3072 bytes), I can
> >>>>>>>>> see that
> >>>>>> when
> >>>>>>>>> full file checksum validation is happening for a file opened via
> >>>>>>>>> READ
> >>>>>>>>> context it is faster than RANDOM. The amount of time difference
> >>>>>>>>> I am
> >>>>>> seeing
> >>>>>>>>> is close to 50%. Hence the performance question is coming up, I
> >>>>>>>>> wanted
> >>>>>> to
> >>>>>>>>> understand is this understanding correct?
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> Navneet
> >>>>>>>>>
> >>>>>>> --
> >>>>>>> Uwe Schindler
> >>>>>>> Achterdiek 19, D-28357 Bremen
> >>>>>>> https://www.thetaphi.de
> >>>>>>> eMail: u...@thetaphi.de
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>>>>>>
> >>>>> --
> >>>>> Uwe Schindler
> >>>>> Achterdiek 19, D-28357 Bremen
> >>>>> https://www.thetaphi.de
> >>>>> eMail: u...@thetaphi.de
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>>>>
> >>>>>
> >> --
> >> Uwe Schindler
> >> Achterdiek 19, D-28357 Bremen
> >> https://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

Reply via email to