Hi Uwe,

*>> thinking about it a bit more: In 10.x we already have some ways to
**preload
data with WILL_NEED (or similar). Maybe this can also be used on **merging
when we reuse an already open IndexInput. Maybe it is possible **to change
the madvise on an already open IndexInput and change it **before merging
(and revert back). This would improve merging and would **not affect.*

Tejas, my teammate and I tried a similar approach mentioned by you above
where he made the changes to ensure that during merge we change the madvise
from random to sequential on a cloned IndexInput.
<https://github.com/shatejas/lucene/commit/4de387288d70b4d8aede45ef3095ae6c1e189331#diff-e0a29611df21f6d32a461e2d24db1585cdf3a8590f08d93b097f0dd84684ebc8R316>
We
saw that the merge time was reduced *from > 10mins to < 1min with 1.6M
768D. *This was done on top of the 9.11.0 version of Lucene. We are
inclined to use this approach.


*>> I know there are some problems if the IndexInput is used for multiple
things like reading, merging and/or checksumming at same time. Some code
tries to reuse already opened index inputs also for merging. But for this
case I think it might be better to open a separate IndexInput and not clone
an existing one for checksumming?*

In the previous emails you suggested that we can also open up a new
IndexInput which can then be used for checksumming during merges and as I
have mentioned earlier doing this gave the similar results. But on doing
further deep-dives I found out that it is not recommended to create
multiple instances of IndexInput in different threads(ref
<https://github.com/apache/lucene/blob/350de210c3674566293681bb58e801629b5ceee7/lucene/core/src/java/org/apache/lucene/store/IndexInput.java#L22-L39>).
So I wanted to understand if this still holds true? As we didn't find any
case as of now where opening multiple IndexInput caused a problem, even
when searches are happening during indexing/merges. Please let us know your
thoughts here.

Thanks
Navneet


On Tue, Oct 1, 2024 at 2:55 AM Uwe Schindler <u...@thetaphi.de> wrote:

> Hi,
>
> thinking about it a bit more: In 10.x we already have some ways to
> preload data with WILL_NEED (or similar). Maybe this can also be used on
> merging when we reuse an already open IndexInput. Maybe it is possible
> to chanhge the madvise on an already open IndexInput and change it
> before merging (and revert back). This would improve merging and would
> not affect.
>
> So I advise to not do any adhoc changes breaking the random read code
> for vectors and docvalues again and think about better ideas. In 10.x we
> have done a lot of thoughts, but the "upgrade an IndexInput for merging
> or checksumming" could be a nice addition - of course revert it back to
> original IOContext with some try/finally after the work is done. This
> would play much nicer with our "reuse IndexInput of NRT readers while
> merging".
>
> Adrien, do you have any ideas?
>
> Uwe
>
> Am 01.10.2024 um 10:17 schrieb Uwe Schindler:
> > Hi,
> >
> > great.
> >
> > I still think the difference between RANDOM and READ is huge in your
> > case. Are you sure that you have not misconfigured your system. The
> > most important thing for Lucene is to make sure that heap space of the
> > Java VM is limited as much as possible (shortly over the OOM boundary)
> > and the free available RAM space is a large as possible to allow
> > MMapDirectory to use off-heap in an ideal way and minimize paging
> > overhead. If you don't do this, the kernel will be under much higher
> > pressure.
> >
> > In general, the correct fix for this is to use RANDOM for normal
> > reading of index and use the other IOContexts only for merging. If tis
> > requires files to be opened multiple times its a better compromise.
> >
> > Please note: we are focusing on 10.x, so please supply PRs/changes for
> > Lucene main branch only, backports will be done automatically. We
> > won't change the IOContexts in 9.x anymore.
> >
> > Uwe
> >
> > Am 01.10.2024 um 10:04 schrieb Navneet Verma:
> >> Hi Uwe,
> >> Thanks for sharing the link and providing the useful information. I will
> >> definitely go ahead and create a gh issue. In the meantime I did some
> >> testing by changing the IOContext from RANDOM to READ for FlatVectors
> >> <
> https://github.com/navneet1v/lucene/commit/cd02e6f39acea82f7e56b36d8fd44156b4e271f9>
>
> >>
> >> and what I can see is the overall merge + integrity checks have already
> >> come down *from > 10mins to < 1min with 1.6M 768D vectors.* This further
> >> confirms that RANDOM is not good for checksums. I didn't validate how
> >> much
> >> HNSW latencies will change if we move to READ. I think there would be
> >> some
> >> latency degradation. We can discuss further around the solutions on the
> >> github issue.
> >>
> >> I will post all these in a github issue and will try to raise a PR
> >> with a
> >> fix. Will be looking forward to your feedback.
> >>
> >> Thanks
> >> Navneet
> >>
> >>
> >> On Tue, Oct 1, 2024 at 12:19 AM Uwe Schindler <u...@thetaphi.de> wrote:
> >>
> >>> Hi,
> >>>
> >>> this seems to be aspecial case in FlatVectors, because normally
> >>> theres a
> >>> separate method to open an IndexInput for checksumming:
> >>>
> >>>
> >>>
> https://github.com/apache/lucene/blob/524ea208c870861a719f21b1ea48943c8b7520da/lucene/core/src/java/org/apache/lucene/store/Directory.java#L155-L157
> >>>
> >>>
> >>> Could you open an issue, it looks like it is not always used? I know
> >>> there are some problems if the IndexInput is used for multiple things
> >>> like reading, merging and/or checksumming at same time. Some code tries
> >>> to reuse already opened index inputs also for merging. But for this
> >>> case
> >>> I think it might be better to open a separate IndexInput and not clone
> >>> an existing one for checksumming?
> >>>
> >>> The first link should of course open the IndexInput with RANDOM,
> >>> because
> >>> during normal reading of vectors this is a must. Generally although the
> >>> checksumming is slower it should not be a big issue, because it won't
> >>> affect searches, only merging of segments. And there the throughput
> >>> should be high, but not top priority.
> >>>
> >>> Uwe
> >>>
> >>> Am 01.10.2024 um 04:52 schrieb Navneet Verma:
> >>>> Hi Uwe and Mike,
> >>>> Thanks for providing such a quick response. Let me try to ans few
> >>>> things
> >>>> here:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> *In addition, inLucene 9.12 (latest 9.x) version released today
> >>>> there are
> >>>> some changesto ensure that checksumming is always done with
> >>>> IOContext.READ_ONCE(which uses READ behind the scenes).*
> >>>> I didn't find any such change for FlatVectorReaders
> >>>> <
> >>>
> https://github.com/apache/lucene/blob/branch_9_12/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java#L68-L77
> >>>
> >>>> ,
> >>>> even though I checked the BufferedChecksumInput
> >>>> <
> >>>
> https://github.com/apache/lucene/blob/branch_9_12/lucene/core/src/java/org/apache/lucene/store/BufferedChecksumIndexInput.java#L31-L35
> >>>
> >>>> and CheckedSumInput
> >>>> <
> >>>
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/store/ChecksumIndexInput.java#L25
> >>>
> >>>> ,
> >>>> CodecUtil
> >>>> <
> >>>
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/CodecUtil.java#L606-L621
> >>>
> >>>> in 9.12 version. Please point me to the right file if I am missing
> >>>> something here. I can see the same for lucene version 10
> >>>> <
> >>>
> https://github.com/apache/lucene/blob/branch_10_0/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java#L69-L77
> >>>
> >>>> too.
> >>>>
> >>>> Mike on the question of what is RANDOM vs READ context doing we found
> >>> this
> >>>> information related to MADV online.
> >>>>
> >>>> MADV_RANDOM Expect page references in random order. (Hence, read ahead
> >>> may
> >>>> be less useful than normally.)
> >>>> MADV_SEQUENTIAL Expect page references in sequential order. (Hence,
> >>>> pages
> >>>> in the given range can be aggressively read ahead, and may be freed
> >>>> soon
> >>>> after they are accessed.)
> >>>> MADV_WILLNEED Expect access in the near future. (Hence, it might be a
> >>> good
> >>>> idea to read some pages ahead.)
> >>>>
> >>>> This tells me that MADV_RANDOM random for checksum is not good as
> >>>> it will
> >>>> consume more read cycles given the sequential nature of the checksum.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> *One simple workaround an application can do is to ask MMapDirectory
> >>>> topre-touch all bytes/pages in .vec/.veq files -- this asks the OS to
> >>>> cacheall of those bytes into page cache (if there is enough free RAM).
> >>> We
> >>>> dothis at Amazon (product search) for our production searching
> >>>> processes.Otherwise paging in all .vec/.veq pages via random access
> >>>> provoked throughHNSW graph searching is crazy slow...*
> >>>> Did you mean the preload functionality offered by MMapDirectory
> >>>> here? I
> >>> can
> >>>> try this to see if that helps. But I doubt that in this case.
> >>>>
> >>>> On opening the issue, I am working through some reproducible
> >>>> benchmarks
> >>>> before creating a gh issue. If you believe I should create a GH issue
> >>> first
> >>>> I can do that. As it might take me sometime to build reproducible
> >>>> benchmarks.
> >>>>
> >>>> Thanks
> >>>> Navneet
> >>>>
> >>>>
> >>>> On Mon, Sep 30, 2024 at 3:08 AM Uwe Schindler <u...@thetaphi.de>
> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> please also note: In Lucene 10 there checksum IndexInput will
> >>>>> always be
> >>>>> opened with IOContext.READ_ONCE.
> >>>>>
> >>>>> If you want to sequentially read a whole index file for other reason
> >>>>> than checksumming, please pass the correct IOContext. In addition, in
> >>>>> Lucene 9.12 (latest 9.x) version released today there are some
> >>>>> changes
> >>>>> to ensure that checksumming is always done with IOContext.READ_ONCE
> >>>>> (which uses READ behind scenes).
> >>>>>
> >>>>> Uwe
> >>>>>
> >>>>> Am 29.09.2024 um 17:09 schrieb Michael McCandless:
> >>>>>> Hi Navneet,
> >>>>>>
> >>>>>> With RANDOM IOcontext, on modern OS's / Java versions, Lucene
> >>>>>> will hint
> >>>> the
> >>>>>> memory mapped segment that the IO will be random using madvise POSIX
> >>> API
> >>>>>> with MADV_RANDOM flag.
> >>>>>>
> >>>>>> For READ IOContext, Lucene maybe hits with MADV_SEQUENTIAL, I'm not
> >>>> sure.
> >>>>>> Or maybe it doesn't hint anything?
> >>>>>>
> >>>>>> It's up to the OS to then take these hints and do something
> >>>> "interesting"
> >>>>>> to try to optimize IO and page caching based on these hints.  I
> >>>>>> think
> >>>>>> modern Linux OSs will readahead (and pre-warm page cache) for
> >>>>>> MADV_SEQUENTIAL?  And maybe skip page cache and readhead for
> >>>> MADV_RANDOM?
> >>>>>> Not certain...
> >>>>>>
> >>>>>> For computing checksum, which is always a sequential operation,
> >>>>>> if we
> >>>> use
> >>>>>> MADV_RANDOM (which is stupid), that is indeed expected to perform
> >>>>>> worse
> >>>>>> since there is no readahead pre-caching.  50% worse (what you are
> >>>> seeing)
> >>>>>> is indeed quite an impact ...
> >>>>>>
> >>>>>> Maybe open an issue?  At least for checksumming we should open even
> >>> .vec
> >>>>>> files for sequential read?  But, then, if it's the same IndexInput
> >>> which
> >>>>>> will then be used "normally" (e.g. for merging), we would want
> >>>>>> THAT one
> >>>> to
> >>>>>> be open for random access ... might be tricky to fix.
> >>>>>>
> >>>>>> One simple workaround an application can do is to ask
> >>>>>> MMapDirectory to
> >>>>>> pre-touch all bytes/pages in .vec/.veq files -- this asks the OS to
> >>>> cache
> >>>>>> all of those bytes into page cache (if there is enough free
> >>>>>> RAM).  We
> >>> do
> >>>>>> this at Amazon (product search) for our production searching
> >>>>>> processes.
> >>>>>> Otherwise paging in all .vec/.veq pages via random access provoked
> >>>> through
> >>>>>> HNSW graph searching is crazy slow...
> >>>>>>
> >>>>>> Mike McCandless
> >>>>>>
> >>>>>> http://blog.mikemccandless.com
> >>>>>>
> >>>>>> On Sun, Sep 29, 2024 at 4:06 AM Navneet Verma <
> >>> vermanavneet...@gmail.com
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi Lucene Experts,
> >>>>>>> I wanted to understand the performance difference between
> >>>>>>> opening and
> >>>>>>> reading the whole file using an IndexInput with IoContext as
> >>>>>>> RANDOM vs
> >>>>>>> READ.
> >>>>>>>
> >>>>>>> I can see .vec files(storing the flat vectors) are opened with
> >>>>>>> RANDOM
> >>>> and
> >>>>>>> whereas dvd files are opened as READ. As per my testing with files
> >>>> close to
> >>>>>>> size 5GB storing (~1.6M docs with each doc 3072 bytes), I can
> >>>>>>> see that
> >>>> when
> >>>>>>> full file checksum validation is happening for a file opened via
> >>>>>>> READ
> >>>>>>> context it is faster than RANDOM. The amount of time difference
> >>>>>>> I am
> >>>> seeing
> >>>>>>> is close to 50%. Hence the performance question is coming up, I
> >>>>>>> wanted
> >>>> to
> >>>>>>> understand is this understanding correct?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Navneet
> >>>>>>>
> >>>>> --
> >>>>> Uwe Schindler
> >>>>> Achterdiek 19, D-28357 Bremen
> >>>>> https://www.thetaphi.de
> >>>>> eMail: u...@thetaphi.de
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>>>>
> >>> --
> >>> Uwe Schindler
> >>> Achterdiek 19, D-28357 Bremen
> >>> https://www.thetaphi.de
> >>> eMail: u...@thetaphi.de
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>>
> >>>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to