Hi Uwe, *>> thinking about it a bit more: In 10.x we already have some ways to **preload data with WILL_NEED (or similar). Maybe this can also be used on **merging when we reuse an already open IndexInput. Maybe it is possible **to change the madvise on an already open IndexInput and change it **before merging (and revert back). This would improve merging and would **not affect.*
Tejas, my teammate and I tried a similar approach mentioned by you above where he made the changes to ensure that during merge we change the madvise from random to sequential on a cloned IndexInput. <https://github.com/shatejas/lucene/commit/4de387288d70b4d8aede45ef3095ae6c1e189331#diff-e0a29611df21f6d32a461e2d24db1585cdf3a8590f08d93b097f0dd84684ebc8R316> We saw that the merge time was reduced *from > 10mins to < 1min with 1.6M 768D. *This was done on top of the 9.11.0 version of Lucene. We are inclined to use this approach. *>> I know there are some problems if the IndexInput is used for multiple things like reading, merging and/or checksumming at same time. Some code tries to reuse already opened index inputs also for merging. But for this case I think it might be better to open a separate IndexInput and not clone an existing one for checksumming?* In the previous emails you suggested that we can also open up a new IndexInput which can then be used for checksumming during merges and as I have mentioned earlier doing this gave the similar results. But on doing further deep-dives I found out that it is not recommended to create multiple instances of IndexInput in different threads(ref <https://github.com/apache/lucene/blob/350de210c3674566293681bb58e801629b5ceee7/lucene/core/src/java/org/apache/lucene/store/IndexInput.java#L22-L39>). So I wanted to understand if this still holds true? As we didn't find any case as of now where opening multiple IndexInput caused a problem, even when searches are happening during indexing/merges. Please let us know your thoughts here. Thanks Navneet On Tue, Oct 1, 2024 at 2:55 AM Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > thinking about it a bit more: In 10.x we already have some ways to > preload data with WILL_NEED (or similar). Maybe this can also be used on > merging when we reuse an already open IndexInput. Maybe it is possible > to chanhge the madvise on an already open IndexInput and change it > before merging (and revert back). This would improve merging and would > not affect. > > So I advise to not do any adhoc changes breaking the random read code > for vectors and docvalues again and think about better ideas. In 10.x we > have done a lot of thoughts, but the "upgrade an IndexInput for merging > or checksumming" could be a nice addition - of course revert it back to > original IOContext with some try/finally after the work is done. This > would play much nicer with our "reuse IndexInput of NRT readers while > merging". > > Adrien, do you have any ideas? > > Uwe > > Am 01.10.2024 um 10:17 schrieb Uwe Schindler: > > Hi, > > > > great. > > > > I still think the difference between RANDOM and READ is huge in your > > case. Are you sure that you have not misconfigured your system. The > > most important thing for Lucene is to make sure that heap space of the > > Java VM is limited as much as possible (shortly over the OOM boundary) > > and the free available RAM space is a large as possible to allow > > MMapDirectory to use off-heap in an ideal way and minimize paging > > overhead. If you don't do this, the kernel will be under much higher > > pressure. > > > > In general, the correct fix for this is to use RANDOM for normal > > reading of index and use the other IOContexts only for merging. If tis > > requires files to be opened multiple times its a better compromise. > > > > Please note: we are focusing on 10.x, so please supply PRs/changes for > > Lucene main branch only, backports will be done automatically. We > > won't change the IOContexts in 9.x anymore. > > > > Uwe > > > > Am 01.10.2024 um 10:04 schrieb Navneet Verma: > >> Hi Uwe, > >> Thanks for sharing the link and providing the useful information. I will > >> definitely go ahead and create a gh issue. In the meantime I did some > >> testing by changing the IOContext from RANDOM to READ for FlatVectors > >> < > https://github.com/navneet1v/lucene/commit/cd02e6f39acea82f7e56b36d8fd44156b4e271f9> > > >> > >> and what I can see is the overall merge + integrity checks have already > >> come down *from > 10mins to < 1min with 1.6M 768D vectors.* This further > >> confirms that RANDOM is not good for checksums. I didn't validate how > >> much > >> HNSW latencies will change if we move to READ. I think there would be > >> some > >> latency degradation. We can discuss further around the solutions on the > >> github issue. > >> > >> I will post all these in a github issue and will try to raise a PR > >> with a > >> fix. Will be looking forward to your feedback. > >> > >> Thanks > >> Navneet > >> > >> > >> On Tue, Oct 1, 2024 at 12:19 AM Uwe Schindler <u...@thetaphi.de> wrote: > >> > >>> Hi, > >>> > >>> this seems to be aspecial case in FlatVectors, because normally > >>> theres a > >>> separate method to open an IndexInput for checksumming: > >>> > >>> > >>> > https://github.com/apache/lucene/blob/524ea208c870861a719f21b1ea48943c8b7520da/lucene/core/src/java/org/apache/lucene/store/Directory.java#L155-L157 > >>> > >>> > >>> Could you open an issue, it looks like it is not always used? I know > >>> there are some problems if the IndexInput is used for multiple things > >>> like reading, merging and/or checksumming at same time. Some code tries > >>> to reuse already opened index inputs also for merging. But for this > >>> case > >>> I think it might be better to open a separate IndexInput and not clone > >>> an existing one for checksumming? > >>> > >>> The first link should of course open the IndexInput with RANDOM, > >>> because > >>> during normal reading of vectors this is a must. Generally although the > >>> checksumming is slower it should not be a big issue, because it won't > >>> affect searches, only merging of segments. And there the throughput > >>> should be high, but not top priority. > >>> > >>> Uwe > >>> > >>> Am 01.10.2024 um 04:52 schrieb Navneet Verma: > >>>> Hi Uwe and Mike, > >>>> Thanks for providing such a quick response. Let me try to ans few > >>>> things > >>>> here: > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> *In addition, inLucene 9.12 (latest 9.x) version released today > >>>> there are > >>>> some changesto ensure that checksumming is always done with > >>>> IOContext.READ_ONCE(which uses READ behind the scenes).* > >>>> I didn't find any such change for FlatVectorReaders > >>>> < > >>> > https://github.com/apache/lucene/blob/branch_9_12/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java#L68-L77 > >>> > >>>> , > >>>> even though I checked the BufferedChecksumInput > >>>> < > >>> > https://github.com/apache/lucene/blob/branch_9_12/lucene/core/src/java/org/apache/lucene/store/BufferedChecksumIndexInput.java#L31-L35 > >>> > >>>> and CheckedSumInput > >>>> < > >>> > https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/store/ChecksumIndexInput.java#L25 > >>> > >>>> , > >>>> CodecUtil > >>>> < > >>> > https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/CodecUtil.java#L606-L621 > >>> > >>>> in 9.12 version. Please point me to the right file if I am missing > >>>> something here. I can see the same for lucene version 10 > >>>> < > >>> > https://github.com/apache/lucene/blob/branch_10_0/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java#L69-L77 > >>> > >>>> too. > >>>> > >>>> Mike on the question of what is RANDOM vs READ context doing we found > >>> this > >>>> information related to MADV online. > >>>> > >>>> MADV_RANDOM Expect page references in random order. (Hence, read ahead > >>> may > >>>> be less useful than normally.) > >>>> MADV_SEQUENTIAL Expect page references in sequential order. (Hence, > >>>> pages > >>>> in the given range can be aggressively read ahead, and may be freed > >>>> soon > >>>> after they are accessed.) > >>>> MADV_WILLNEED Expect access in the near future. (Hence, it might be a > >>> good > >>>> idea to read some pages ahead.) > >>>> > >>>> This tells me that MADV_RANDOM random for checksum is not good as > >>>> it will > >>>> consume more read cycles given the sequential nature of the checksum. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> *One simple workaround an application can do is to ask MMapDirectory > >>>> topre-touch all bytes/pages in .vec/.veq files -- this asks the OS to > >>>> cacheall of those bytes into page cache (if there is enough free RAM). > >>> We > >>>> dothis at Amazon (product search) for our production searching > >>>> processes.Otherwise paging in all .vec/.veq pages via random access > >>>> provoked throughHNSW graph searching is crazy slow...* > >>>> Did you mean the preload functionality offered by MMapDirectory > >>>> here? I > >>> can > >>>> try this to see if that helps. But I doubt that in this case. > >>>> > >>>> On opening the issue, I am working through some reproducible > >>>> benchmarks > >>>> before creating a gh issue. If you believe I should create a GH issue > >>> first > >>>> I can do that. As it might take me sometime to build reproducible > >>>> benchmarks. > >>>> > >>>> Thanks > >>>> Navneet > >>>> > >>>> > >>>> On Mon, Sep 30, 2024 at 3:08 AM Uwe Schindler <u...@thetaphi.de> > wrote: > >>>>> Hi, > >>>>> > >>>>> please also note: In Lucene 10 there checksum IndexInput will > >>>>> always be > >>>>> opened with IOContext.READ_ONCE. > >>>>> > >>>>> If you want to sequentially read a whole index file for other reason > >>>>> than checksumming, please pass the correct IOContext. In addition, in > >>>>> Lucene 9.12 (latest 9.x) version released today there are some > >>>>> changes > >>>>> to ensure that checksumming is always done with IOContext.READ_ONCE > >>>>> (which uses READ behind scenes). > >>>>> > >>>>> Uwe > >>>>> > >>>>> Am 29.09.2024 um 17:09 schrieb Michael McCandless: > >>>>>> Hi Navneet, > >>>>>> > >>>>>> With RANDOM IOcontext, on modern OS's / Java versions, Lucene > >>>>>> will hint > >>>> the > >>>>>> memory mapped segment that the IO will be random using madvise POSIX > >>> API > >>>>>> with MADV_RANDOM flag. > >>>>>> > >>>>>> For READ IOContext, Lucene maybe hits with MADV_SEQUENTIAL, I'm not > >>>> sure. > >>>>>> Or maybe it doesn't hint anything? > >>>>>> > >>>>>> It's up to the OS to then take these hints and do something > >>>> "interesting" > >>>>>> to try to optimize IO and page caching based on these hints. I > >>>>>> think > >>>>>> modern Linux OSs will readahead (and pre-warm page cache) for > >>>>>> MADV_SEQUENTIAL? And maybe skip page cache and readhead for > >>>> MADV_RANDOM? > >>>>>> Not certain... > >>>>>> > >>>>>> For computing checksum, which is always a sequential operation, > >>>>>> if we > >>>> use > >>>>>> MADV_RANDOM (which is stupid), that is indeed expected to perform > >>>>>> worse > >>>>>> since there is no readahead pre-caching. 50% worse (what you are > >>>> seeing) > >>>>>> is indeed quite an impact ... > >>>>>> > >>>>>> Maybe open an issue? At least for checksumming we should open even > >>> .vec > >>>>>> files for sequential read? But, then, if it's the same IndexInput > >>> which > >>>>>> will then be used "normally" (e.g. for merging), we would want > >>>>>> THAT one > >>>> to > >>>>>> be open for random access ... might be tricky to fix. > >>>>>> > >>>>>> One simple workaround an application can do is to ask > >>>>>> MMapDirectory to > >>>>>> pre-touch all bytes/pages in .vec/.veq files -- this asks the OS to > >>>> cache > >>>>>> all of those bytes into page cache (if there is enough free > >>>>>> RAM). We > >>> do > >>>>>> this at Amazon (product search) for our production searching > >>>>>> processes. > >>>>>> Otherwise paging in all .vec/.veq pages via random access provoked > >>>> through > >>>>>> HNSW graph searching is crazy slow... > >>>>>> > >>>>>> Mike McCandless > >>>>>> > >>>>>> http://blog.mikemccandless.com > >>>>>> > >>>>>> On Sun, Sep 29, 2024 at 4:06 AM Navneet Verma < > >>> vermanavneet...@gmail.com > >>>>>> wrote: > >>>>>> > >>>>>>> Hi Lucene Experts, > >>>>>>> I wanted to understand the performance difference between > >>>>>>> opening and > >>>>>>> reading the whole file using an IndexInput with IoContext as > >>>>>>> RANDOM vs > >>>>>>> READ. > >>>>>>> > >>>>>>> I can see .vec files(storing the flat vectors) are opened with > >>>>>>> RANDOM > >>>> and > >>>>>>> whereas dvd files are opened as READ. As per my testing with files > >>>> close to > >>>>>>> size 5GB storing (~1.6M docs with each doc 3072 bytes), I can > >>>>>>> see that > >>>> when > >>>>>>> full file checksum validation is happening for a file opened via > >>>>>>> READ > >>>>>>> context it is faster than RANDOM. The amount of time difference > >>>>>>> I am > >>>> seeing > >>>>>>> is close to 50%. Hence the performance question is coming up, I > >>>>>>> wanted > >>>> to > >>>>>>> understand is this understanding correct? > >>>>>>> > >>>>>>> Thanks > >>>>>>> Navneet > >>>>>>> > >>>>> -- > >>>>> Uwe Schindler > >>>>> Achterdiek 19, D-28357 Bremen > >>>>> https://www.thetaphi.de > >>>>> eMail: u...@thetaphi.de > >>>>> > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>>> > >>> -- > >>> Uwe Schindler > >>> Achterdiek 19, D-28357 Bremen > >>> https://www.thetaphi.de > >>> eMail: u...@thetaphi.de > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>> > >>> > -- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de > eMail: u...@thetaphi.de > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >