Hi Uwe, Thanks for sharing the link and providing the useful information. I will definitely go ahead and create a gh issue. In the meantime I did some testing by changing the IOContext from RANDOM to READ for FlatVectors <https://github.com/navneet1v/lucene/commit/cd02e6f39acea82f7e56b36d8fd44156b4e271f9> and what I can see is the overall merge + integrity checks have already come down *from > 10mins to < 1min with 1.6M 768D vectors.* This further confirms that RANDOM is not good for checksums. I didn't validate how much HNSW latencies will change if we move to READ. I think there would be some latency degradation. We can discuss further around the solutions on the github issue.
I will post all these in a github issue and will try to raise a PR with a fix. Will be looking forward to your feedback. Thanks Navneet On Tue, Oct 1, 2024 at 12:19 AM Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > this seems to be aspecial case in FlatVectors, because normally theres a > separate method to open an IndexInput for checksumming: > > > https://github.com/apache/lucene/blob/524ea208c870861a719f21b1ea48943c8b7520da/lucene/core/src/java/org/apache/lucene/store/Directory.java#L155-L157 > > Could you open an issue, it looks like it is not always used? I know > there are some problems if the IndexInput is used for multiple things > like reading, merging and/or checksumming at same time. Some code tries > to reuse already opened index inputs also for merging. But for this case > I think it might be better to open a separate IndexInput and not clone > an existing one for checksumming? > > The first link should of course open the IndexInput with RANDOM, because > during normal reading of vectors this is a must. Generally although the > checksumming is slower it should not be a big issue, because it won't > affect searches, only merging of segments. And there the throughput > should be high, but not top priority. > > Uwe > > Am 01.10.2024 um 04:52 schrieb Navneet Verma: > > Hi Uwe and Mike, > > Thanks for providing such a quick response. Let me try to ans few things > > here: > > > > > > > > > > > > *In addition, inLucene 9.12 (latest 9.x) version released today there are > > some changesto ensure that checksumming is always done with > > IOContext.READ_ONCE(which uses READ behind the scenes).* > > I didn't find any such change for FlatVectorReaders > > < > https://github.com/apache/lucene/blob/branch_9_12/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java#L68-L77 > >, > > even though I checked the BufferedChecksumInput > > < > https://github.com/apache/lucene/blob/branch_9_12/lucene/core/src/java/org/apache/lucene/store/BufferedChecksumIndexInput.java#L31-L35 > > > > and CheckedSumInput > > < > https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/store/ChecksumIndexInput.java#L25 > >, > > CodecUtil > > < > https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/CodecUtil.java#L606-L621 > > > > in 9.12 version. Please point me to the right file if I am missing > > something here. I can see the same for lucene version 10 > > < > https://github.com/apache/lucene/blob/branch_10_0/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java#L69-L77 > > > > too. > > > > Mike on the question of what is RANDOM vs READ context doing we found > this > > information related to MADV online. > > > > MADV_RANDOM Expect page references in random order. (Hence, read ahead > may > > be less useful than normally.) > > MADV_SEQUENTIAL Expect page references in sequential order. (Hence, pages > > in the given range can be aggressively read ahead, and may be freed soon > > after they are accessed.) > > MADV_WILLNEED Expect access in the near future. (Hence, it might be a > good > > idea to read some pages ahead.) > > > > This tells me that MADV_RANDOM random for checksum is not good as it will > > consume more read cycles given the sequential nature of the checksum. > > > > > > > > > > > > > > > > *One simple workaround an application can do is to ask MMapDirectory > > topre-touch all bytes/pages in .vec/.veq files -- this asks the OS to > > cacheall of those bytes into page cache (if there is enough free RAM). > We > > dothis at Amazon (product search) for our production searching > > processes.Otherwise paging in all .vec/.veq pages via random access > > provoked throughHNSW graph searching is crazy slow...* > > Did you mean the preload functionality offered by MMapDirectory here? I > can > > try this to see if that helps. But I doubt that in this case. > > > > On opening the issue, I am working through some reproducible benchmarks > > before creating a gh issue. If you believe I should create a GH issue > first > > I can do that. As it might take me sometime to build reproducible > > benchmarks. > > > > Thanks > > Navneet > > > > > > On Mon, Sep 30, 2024 at 3:08 AM Uwe Schindler <u...@thetaphi.de> wrote: > >> Hi, > >> > >> please also note: In Lucene 10 there checksum IndexInput will always be > >> opened with IOContext.READ_ONCE. > >> > >> If you want to sequentially read a whole index file for other reason > >> than checksumming, please pass the correct IOContext. In addition, in > >> Lucene 9.12 (latest 9.x) version released today there are some changes > >> to ensure that checksumming is always done with IOContext.READ_ONCE > >> (which uses READ behind scenes). > >> > >> Uwe > >> > >> Am 29.09.2024 um 17:09 schrieb Michael McCandless: > >>> Hi Navneet, > >>> > >>> With RANDOM IOcontext, on modern OS's / Java versions, Lucene will hint > > the > >>> memory mapped segment that the IO will be random using madvise POSIX > API > >>> with MADV_RANDOM flag. > >>> > >>> For READ IOContext, Lucene maybe hits with MADV_SEQUENTIAL, I'm not > > sure. > >>> Or maybe it doesn't hint anything? > >>> > >>> It's up to the OS to then take these hints and do something > > "interesting" > >>> to try to optimize IO and page caching based on these hints. I think > >>> modern Linux OSs will readahead (and pre-warm page cache) for > >>> MADV_SEQUENTIAL? And maybe skip page cache and readhead for > > MADV_RANDOM? > >>> Not certain... > >>> > >>> For computing checksum, which is always a sequential operation, if we > > use > >>> MADV_RANDOM (which is stupid), that is indeed expected to perform worse > >>> since there is no readahead pre-caching. 50% worse (what you are > > seeing) > >>> is indeed quite an impact ... > >>> > >>> Maybe open an issue? At least for checksumming we should open even > .vec > >>> files for sequential read? But, then, if it's the same IndexInput > which > >>> will then be used "normally" (e.g. for merging), we would want THAT one > > to > >>> be open for random access ... might be tricky to fix. > >>> > >>> One simple workaround an application can do is to ask MMapDirectory to > >>> pre-touch all bytes/pages in .vec/.veq files -- this asks the OS to > > cache > >>> all of those bytes into page cache (if there is enough free RAM). We > do > >>> this at Amazon (product search) for our production searching processes. > >>> Otherwise paging in all .vec/.veq pages via random access provoked > > through > >>> HNSW graph searching is crazy slow... > >>> > >>> Mike McCandless > >>> > >>> http://blog.mikemccandless.com > >>> > >>> On Sun, Sep 29, 2024 at 4:06 AM Navneet Verma < > vermanavneet...@gmail.com > >>> wrote: > >>> > >>>> Hi Lucene Experts, > >>>> I wanted to understand the performance difference between opening and > >>>> reading the whole file using an IndexInput with IoContext as RANDOM vs > >>>> READ. > >>>> > >>>> I can see .vec files(storing the flat vectors) are opened with RANDOM > > and > >>>> whereas dvd files are opened as READ. As per my testing with files > > close to > >>>> size 5GB storing (~1.6M docs with each doc 3072 bytes), I can see that > > when > >>>> full file checksum validation is happening for a file opened via READ > >>>> context it is faster than RANDOM. The amount of time difference I am > > seeing > >>>> is close to 50%. Hence the performance question is coming up, I wanted > > to > >>>> understand is this understanding correct? > >>>> > >>>> Thanks > >>>> Navneet > >>>> > >> -- > >> Uwe Schindler > >> Achterdiek 19, D-28357 Bremen > >> https://www.thetaphi.de > >> eMail: u...@thetaphi.de > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > -- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de > eMail: u...@thetaphi.de > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >