Hi Uwe,
Thanks for sharing the link and providing the useful information. I will
definitely go ahead and create a gh issue. In the meantime I did some
testing by changing the IOContext from RANDOM to READ for FlatVectors
<https://github.com/navneet1v/lucene/commit/cd02e6f39acea82f7e56b36d8fd44156b4e271f9>
and what I can see is the overall merge + integrity checks have already
come down *from > 10mins to < 1min with 1.6M 768D vectors.* This further
confirms that RANDOM is not good for checksums. I didn't validate how much
HNSW latencies will change if we move to READ. I think there would be some
latency degradation. We can discuss further around the solutions on the
github issue.

I will post all these in a github issue and will try to raise a PR with a
fix. Will be looking forward to your feedback.

Thanks
Navneet


On Tue, Oct 1, 2024 at 12:19 AM Uwe Schindler <u...@thetaphi.de> wrote:

> Hi,
>
> this seems to be aspecial case in FlatVectors, because normally theres a
> separate method to open an IndexInput for checksumming:
>
>
> https://github.com/apache/lucene/blob/524ea208c870861a719f21b1ea48943c8b7520da/lucene/core/src/java/org/apache/lucene/store/Directory.java#L155-L157
>
> Could you open an issue, it looks like it is not always used? I know
> there are some problems if the IndexInput is used for multiple things
> like reading, merging and/or checksumming at same time. Some code tries
> to reuse already opened index inputs also for merging. But for this case
> I think it might be better to open a separate IndexInput and not clone
> an existing one for checksumming?
>
> The first link should of course open the IndexInput with RANDOM, because
> during normal reading of vectors this is a must. Generally although the
> checksumming is slower it should not be a big issue, because it won't
> affect searches, only merging of segments. And there the throughput
> should be high, but not top priority.
>
> Uwe
>
> Am 01.10.2024 um 04:52 schrieb Navneet Verma:
> > Hi Uwe and Mike,
> > Thanks for providing such a quick response. Let me try to ans few things
> > here:
> >
> >
> >
> >
> >
> > *In addition, inLucene 9.12 (latest 9.x) version released today there are
> > some changesto ensure that checksumming is always done with
> > IOContext.READ_ONCE(which uses READ behind the scenes).*
> > I didn't find any such change for FlatVectorReaders
> > <
> https://github.com/apache/lucene/blob/branch_9_12/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java#L68-L77
> >,
> > even though I checked the BufferedChecksumInput
> > <
> https://github.com/apache/lucene/blob/branch_9_12/lucene/core/src/java/org/apache/lucene/store/BufferedChecksumIndexInput.java#L31-L35
> >
> > and CheckedSumInput
> > <
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/store/ChecksumIndexInput.java#L25
> >,
> > CodecUtil
> > <
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/CodecUtil.java#L606-L621
> >
> > in 9.12 version. Please point me to the right file if I am missing
> > something here. I can see the same for lucene version 10
> > <
> https://github.com/apache/lucene/blob/branch_10_0/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java#L69-L77
> >
> > too.
> >
> > Mike on the question of what is RANDOM vs READ context doing we found
> this
> > information related to MADV online.
> >
> > MADV_RANDOM Expect page references in random order. (Hence, read ahead
> may
> > be less useful than normally.)
> > MADV_SEQUENTIAL Expect page references in sequential order. (Hence, pages
> > in the given range can be aggressively read ahead, and may be freed soon
> > after they are accessed.)
> > MADV_WILLNEED Expect access in the near future. (Hence, it might be a
> good
> > idea to read some pages ahead.)
> >
> > This tells me that MADV_RANDOM random for checksum is not good as it will
> > consume more read cycles given the sequential nature of the checksum.
> >
> >
> >
> >
> >
> >
> >
> > *One simple workaround an application can do is to ask MMapDirectory
> > topre-touch all bytes/pages in .vec/.veq files -- this asks the OS to
> > cacheall of those bytes into page cache (if there is enough free RAM).
> We
> > dothis at Amazon (product search) for our production searching
> > processes.Otherwise paging in all .vec/.veq pages via random access
> > provoked throughHNSW graph searching is crazy slow...*
> > Did you mean the preload functionality offered by MMapDirectory here? I
> can
> > try this to see if that helps. But I doubt that in this case.
> >
> > On opening the issue, I am working through some reproducible benchmarks
> > before creating a gh issue. If you believe I should create a GH issue
> first
> > I can do that. As it might take me sometime to build reproducible
> > benchmarks.
> >
> > Thanks
> > Navneet
> >
> >
> > On Mon, Sep 30, 2024 at 3:08 AM Uwe Schindler <u...@thetaphi.de> wrote:
> >> Hi,
> >>
> >> please also note: In Lucene 10 there checksum IndexInput will always be
> >> opened with IOContext.READ_ONCE.
> >>
> >> If you want to sequentially read a whole index file for other reason
> >> than checksumming, please pass the correct IOContext. In addition, in
> >> Lucene 9.12 (latest 9.x) version released today there are some changes
> >> to ensure that checksumming is always done with IOContext.READ_ONCE
> >> (which uses READ behind scenes).
> >>
> >> Uwe
> >>
> >> Am 29.09.2024 um 17:09 schrieb Michael McCandless:
> >>> Hi Navneet,
> >>>
> >>> With RANDOM IOcontext, on modern OS's / Java versions, Lucene will hint
> > the
> >>> memory mapped segment that the IO will be random using madvise POSIX
> API
> >>> with MADV_RANDOM flag.
> >>>
> >>> For READ IOContext, Lucene maybe hits with MADV_SEQUENTIAL, I'm not
> > sure.
> >>> Or maybe it doesn't hint anything?
> >>>
> >>> It's up to the OS to then take these hints and do something
> > "interesting"
> >>> to try to optimize IO and page caching based on these hints.  I think
> >>> modern Linux OSs will readahead (and pre-warm page cache) for
> >>> MADV_SEQUENTIAL?  And maybe skip page cache and readhead for
> > MADV_RANDOM?
> >>> Not certain...
> >>>
> >>> For computing checksum, which is always a sequential operation, if we
> > use
> >>> MADV_RANDOM (which is stupid), that is indeed expected to perform worse
> >>> since there is no readahead pre-caching.  50% worse (what you are
> > seeing)
> >>> is indeed quite an impact ...
> >>>
> >>> Maybe open an issue?  At least for checksumming we should open even
> .vec
> >>> files for sequential read?  But, then, if it's the same IndexInput
> which
> >>> will then be used "normally" (e.g. for merging), we would want THAT one
> > to
> >>> be open for random access ... might be tricky to fix.
> >>>
> >>> One simple workaround an application can do is to ask MMapDirectory to
> >>> pre-touch all bytes/pages in .vec/.veq files -- this asks the OS to
> > cache
> >>> all of those bytes into page cache (if there is enough free RAM).  We
> do
> >>> this at Amazon (product search) for our production searching processes.
> >>> Otherwise paging in all .vec/.veq pages via random access provoked
> > through
> >>> HNSW graph searching is crazy slow...
> >>>
> >>> Mike McCandless
> >>>
> >>> http://blog.mikemccandless.com
> >>>
> >>> On Sun, Sep 29, 2024 at 4:06 AM Navneet Verma <
> vermanavneet...@gmail.com
> >>> wrote:
> >>>
> >>>> Hi Lucene Experts,
> >>>> I wanted to understand the performance difference between opening and
> >>>> reading the whole file using an IndexInput with IoContext as RANDOM vs
> >>>> READ.
> >>>>
> >>>> I can see .vec files(storing the flat vectors) are opened with RANDOM
> > and
> >>>> whereas dvd files are opened as READ. As per my testing with files
> > close to
> >>>> size 5GB storing (~1.6M docs with each doc 3072 bytes), I can see that
> > when
> >>>> full file checksum validation is happening for a file opened via READ
> >>>> context it is faster than RANDOM. The amount of time difference I am
> > seeing
> >>>> is close to 50%. Hence the performance question is coming up, I wanted
> > to
> >>>> understand is this understanding correct?
> >>>>
> >>>> Thanks
> >>>> Navneet
> >>>>
> >> --
> >> Uwe Schindler
> >> Achterdiek 19, D-28357 Bremen
> >> https://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to