Hi Uwe, Thanks for the prompt response. I have created the gh issue: https://github.com/apache/lucene/issues/13920 for more discussion. We can move all discussions to the gh issues.
Thanks Navneet On Tue, Oct 15, 2024 at 3:17 AM Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > The problem with your aproach is that you can change the madvise on a > clone, but as the underlying memory is the same for the cloned index > input, it won't revert back to RANDOM. > > Basically there's no need to clone or create a slice. We should better > allow to change the advise for an IndexInput and restore it later. We > have that functionality in Lucene's 10.x version already, it can create > slices. > > The linked diff is too intrusive, we won't accept this as a PR, because > it does not use the madvise call in correcty ways and changes semantics > of preloading. Please open an issue instead for discussion. > > Uwe > > Am 15.10.2024 um 09:06 schrieb Navneet Verma: > > Hi Uwe, > > > > *>> thinking about it a bit more: In 10.x we already have some ways to > > **preload > > data with WILL_NEED (or similar). Maybe this can also be used on > **merging > > when we reuse an already open IndexInput. Maybe it is possible **to > change > > the madvise on an already open IndexInput and change it **before merging > > (and revert back). This would improve merging and would **not affect.* > > > > Tejas, my teammate and I tried a similar approach mentioned by you above > > where he made the changes to ensure that during merge we change the > madvise > > from random to sequential on a cloned IndexInput. > > < > https://github.com/shatejas/lucene/commit/4de387288d70b4d8aede45ef3095ae6c1e189331#diff-e0a29611df21f6d32a461e2d24db1585cdf3a8590f08d93b097f0dd84684ebc8R316 > > > > We > > saw that the merge time was reduced *from > 10mins to < 1min with 1.6M > > 768D. *This was done on top of the 9.11.0 version of Lucene. We are > > inclined to use this approach. > > > > > > *>> I know there are some problems if the IndexInput is used for multiple > > things like reading, merging and/or checksumming at same time. Some code > > tries to reuse already opened index inputs also for merging. But for this > > case I think it might be better to open a separate IndexInput and not > clone > > an existing one for checksumming?* > > > > In the previous emails you suggested that we can also open up a new > > IndexInput which can then be used for checksumming during merges and as I > > have mentioned earlier doing this gave the similar results. But on doing > > further deep-dives I found out that it is not recommended to create > > multiple instances of IndexInput in different threads(ref > > < > https://github.com/apache/lucene/blob/350de210c3674566293681bb58e801629b5ceee7/lucene/core/src/java/org/apache/lucene/store/IndexInput.java#L22-L39 > >). > > So I wanted to understand if this still holds true? As we didn't find any > > case as of now where opening multiple IndexInput caused a problem, even > > when searches are happening during indexing/merges. Please let us know > your > > thoughts here. > > > > Thanks > > Navneet > > > > > > On Tue, Oct 1, 2024 at 2:55 AM Uwe Schindler <u...@thetaphi.de> wrote: > > > >> Hi, > >> > >> thinking about it a bit more: In 10.x we already have some ways to > >> preload data with WILL_NEED (or similar). Maybe this can also be used on > >> merging when we reuse an already open IndexInput. Maybe it is possible > >> to chanhge the madvise on an already open IndexInput and change it > >> before merging (and revert back). This would improve merging and would > >> not affect. > >> > >> So I advise to not do any adhoc changes breaking the random read code > >> for vectors and docvalues again and think about better ideas. In 10.x we > >> have done a lot of thoughts, but the "upgrade an IndexInput for merging > >> or checksumming" could be a nice addition - of course revert it back to > >> original IOContext with some try/finally after the work is done. This > >> would play much nicer with our "reuse IndexInput of NRT readers while > >> merging". > >> > >> Adrien, do you have any ideas? > >> > >> Uwe > >> > >> Am 01.10.2024 um 10:17 schrieb Uwe Schindler: > >>> Hi, > >>> > >>> great. > >>> > >>> I still think the difference between RANDOM and READ is huge in your > >>> case. Are you sure that you have not misconfigured your system. The > >>> most important thing for Lucene is to make sure that heap space of the > >>> Java VM is limited as much as possible (shortly over the OOM boundary) > >>> and the free available RAM space is a large as possible to allow > >>> MMapDirectory to use off-heap in an ideal way and minimize paging > >>> overhead. If you don't do this, the kernel will be under much higher > >>> pressure. > >>> > >>> In general, the correct fix for this is to use RANDOM for normal > >>> reading of index and use the other IOContexts only for merging. If tis > >>> requires files to be opened multiple times its a better compromise. > >>> > >>> Please note: we are focusing on 10.x, so please supply PRs/changes for > >>> Lucene main branch only, backports will be done automatically. We > >>> won't change the IOContexts in 9.x anymore. > >>> > >>> Uwe > >>> > >>> Am 01.10.2024 um 10:04 schrieb Navneet Verma: > >>>> Hi Uwe, > >>>> Thanks for sharing the link and providing the useful information. I > will > >>>> definitely go ahead and create a gh issue. In the meantime I did some > >>>> testing by changing the IOContext from RANDOM to READ for FlatVectors > >>>> < > >> > https://github.com/navneet1v/lucene/commit/cd02e6f39acea82f7e56b36d8fd44156b4e271f9 > > > >> > >>>> and what I can see is the overall merge + integrity checks have > already > >>>> come down *from > 10mins to < 1min with 1.6M 768D vectors.* This > further > >>>> confirms that RANDOM is not good for checksums. I didn't validate how > >>>> much > >>>> HNSW latencies will change if we move to READ. I think there would be > >>>> some > >>>> latency degradation. We can discuss further around the solutions on > the > >>>> github issue. > >>>> > >>>> I will post all these in a github issue and will try to raise a PR > >>>> with a > >>>> fix. Will be looking forward to your feedback. > >>>> > >>>> Thanks > >>>> Navneet > >>>> > >>>> > >>>> On Tue, Oct 1, 2024 at 12:19 AM Uwe Schindler <u...@thetaphi.de> > wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> this seems to be aspecial case in FlatVectors, because normally > >>>>> theres a > >>>>> separate method to open an IndexInput for checksumming: > >>>>> > >>>>> > >>>>> > >> > https://github.com/apache/lucene/blob/524ea208c870861a719f21b1ea48943c8b7520da/lucene/core/src/java/org/apache/lucene/store/Directory.java#L155-L157 > >>>>> > >>>>> Could you open an issue, it looks like it is not always used? I know > >>>>> there are some problems if the IndexInput is used for multiple things > >>>>> like reading, merging and/or checksumming at same time. Some code > tries > >>>>> to reuse already opened index inputs also for merging. But for this > >>>>> case > >>>>> I think it might be better to open a separate IndexInput and not > clone > >>>>> an existing one for checksumming? > >>>>> > >>>>> The first link should of course open the IndexInput with RANDOM, > >>>>> because > >>>>> during normal reading of vectors this is a must. Generally although > the > >>>>> checksumming is slower it should not be a big issue, because it won't > >>>>> affect searches, only merging of segments. And there the throughput > >>>>> should be high, but not top priority. > >>>>> > >>>>> Uwe > >>>>> > >>>>> Am 01.10.2024 um 04:52 schrieb Navneet Verma: > >>>>>> Hi Uwe and Mike, > >>>>>> Thanks for providing such a quick response. Let me try to ans few > >>>>>> things > >>>>>> here: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> *In addition, inLucene 9.12 (latest 9.x) version released today > >>>>>> there are > >>>>>> some changesto ensure that checksumming is always done with > >>>>>> IOContext.READ_ONCE(which uses READ behind the scenes).* > >>>>>> I didn't find any such change for FlatVectorReaders > >>>>>> < > >> > https://github.com/apache/lucene/blob/branch_9_12/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java#L68-L77 > >>>>>> , > >>>>>> even though I checked the BufferedChecksumInput > >>>>>> < > >> > https://github.com/apache/lucene/blob/branch_9_12/lucene/core/src/java/org/apache/lucene/store/BufferedChecksumIndexInput.java#L31-L35 > >>>>>> and CheckedSumInput > >>>>>> < > >> > https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/store/ChecksumIndexInput.java#L25 > >>>>>> , > >>>>>> CodecUtil > >>>>>> < > >> > https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/CodecUtil.java#L606-L621 > >>>>>> in 9.12 version. Please point me to the right file if I am missing > >>>>>> something here. I can see the same for lucene version 10 > >>>>>> < > >> > https://github.com/apache/lucene/blob/branch_10_0/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java#L69-L77 > >>>>>> too. > >>>>>> > >>>>>> Mike on the question of what is RANDOM vs READ context doing we > found > >>>>> this > >>>>>> information related to MADV online. > >>>>>> > >>>>>> MADV_RANDOM Expect page references in random order. (Hence, read > ahead > >>>>> may > >>>>>> be less useful than normally.) > >>>>>> MADV_SEQUENTIAL Expect page references in sequential order. (Hence, > >>>>>> pages > >>>>>> in the given range can be aggressively read ahead, and may be freed > >>>>>> soon > >>>>>> after they are accessed.) > >>>>>> MADV_WILLNEED Expect access in the near future. (Hence, it might be > a > >>>>> good > >>>>>> idea to read some pages ahead.) > >>>>>> > >>>>>> This tells me that MADV_RANDOM random for checksum is not good as > >>>>>> it will > >>>>>> consume more read cycles given the sequential nature of the > checksum. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> *One simple workaround an application can do is to ask MMapDirectory > >>>>>> topre-touch all bytes/pages in .vec/.veq files -- this asks the OS > to > >>>>>> cacheall of those bytes into page cache (if there is enough free > RAM). > >>>>> We > >>>>>> dothis at Amazon (product search) for our production searching > >>>>>> processes.Otherwise paging in all .vec/.veq pages via random access > >>>>>> provoked throughHNSW graph searching is crazy slow...* > >>>>>> Did you mean the preload functionality offered by MMapDirectory > >>>>>> here? I > >>>>> can > >>>>>> try this to see if that helps. But I doubt that in this case. > >>>>>> > >>>>>> On opening the issue, I am working through some reproducible > >>>>>> benchmarks > >>>>>> before creating a gh issue. If you believe I should create a GH > issue > >>>>> first > >>>>>> I can do that. As it might take me sometime to build reproducible > >>>>>> benchmarks. > >>>>>> > >>>>>> Thanks > >>>>>> Navneet > >>>>>> > >>>>>> > >>>>>> On Mon, Sep 30, 2024 at 3:08 AM Uwe Schindler <u...@thetaphi.de> > >> wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> please also note: In Lucene 10 there checksum IndexInput will > >>>>>>> always be > >>>>>>> opened with IOContext.READ_ONCE. > >>>>>>> > >>>>>>> If you want to sequentially read a whole index file for other > reason > >>>>>>> than checksumming, please pass the correct IOContext. In addition, > in > >>>>>>> Lucene 9.12 (latest 9.x) version released today there are some > >>>>>>> changes > >>>>>>> to ensure that checksumming is always done with IOContext.READ_ONCE > >>>>>>> (which uses READ behind scenes). > >>>>>>> > >>>>>>> Uwe > >>>>>>> > >>>>>>> Am 29.09.2024 um 17:09 schrieb Michael McCandless: > >>>>>>>> Hi Navneet, > >>>>>>>> > >>>>>>>> With RANDOM IOcontext, on modern OS's / Java versions, Lucene > >>>>>>>> will hint > >>>>>> the > >>>>>>>> memory mapped segment that the IO will be random using madvise > POSIX > >>>>> API > >>>>>>>> with MADV_RANDOM flag. > >>>>>>>> > >>>>>>>> For READ IOContext, Lucene maybe hits with MADV_SEQUENTIAL, I'm > not > >>>>>> sure. > >>>>>>>> Or maybe it doesn't hint anything? > >>>>>>>> > >>>>>>>> It's up to the OS to then take these hints and do something > >>>>>> "interesting" > >>>>>>>> to try to optimize IO and page caching based on these hints. I > >>>>>>>> think > >>>>>>>> modern Linux OSs will readahead (and pre-warm page cache) for > >>>>>>>> MADV_SEQUENTIAL? And maybe skip page cache and readhead for > >>>>>> MADV_RANDOM? > >>>>>>>> Not certain... > >>>>>>>> > >>>>>>>> For computing checksum, which is always a sequential operation, > >>>>>>>> if we > >>>>>> use > >>>>>>>> MADV_RANDOM (which is stupid), that is indeed expected to perform > >>>>>>>> worse > >>>>>>>> since there is no readahead pre-caching. 50% worse (what you are > >>>>>> seeing) > >>>>>>>> is indeed quite an impact ... > >>>>>>>> > >>>>>>>> Maybe open an issue? At least for checksumming we should open > even > >>>>> .vec > >>>>>>>> files for sequential read? But, then, if it's the same IndexInput > >>>>> which > >>>>>>>> will then be used "normally" (e.g. for merging), we would want > >>>>>>>> THAT one > >>>>>> to > >>>>>>>> be open for random access ... might be tricky to fix. > >>>>>>>> > >>>>>>>> One simple workaround an application can do is to ask > >>>>>>>> MMapDirectory to > >>>>>>>> pre-touch all bytes/pages in .vec/.veq files -- this asks the OS > to > >>>>>> cache > >>>>>>>> all of those bytes into page cache (if there is enough free > >>>>>>>> RAM). We > >>>>> do > >>>>>>>> this at Amazon (product search) for our production searching > >>>>>>>> processes. > >>>>>>>> Otherwise paging in all .vec/.veq pages via random access provoked > >>>>>> through > >>>>>>>> HNSW graph searching is crazy slow... > >>>>>>>> > >>>>>>>> Mike McCandless > >>>>>>>> > >>>>>>>> http://blog.mikemccandless.com > >>>>>>>> > >>>>>>>> On Sun, Sep 29, 2024 at 4:06 AM Navneet Verma < > >>>>> vermanavneet...@gmail.com > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi Lucene Experts, > >>>>>>>>> I wanted to understand the performance difference between > >>>>>>>>> opening and > >>>>>>>>> reading the whole file using an IndexInput with IoContext as > >>>>>>>>> RANDOM vs > >>>>>>>>> READ. > >>>>>>>>> > >>>>>>>>> I can see .vec files(storing the flat vectors) are opened with > >>>>>>>>> RANDOM > >>>>>> and > >>>>>>>>> whereas dvd files are opened as READ. As per my testing with > files > >>>>>> close to > >>>>>>>>> size 5GB storing (~1.6M docs with each doc 3072 bytes), I can > >>>>>>>>> see that > >>>>>> when > >>>>>>>>> full file checksum validation is happening for a file opened via > >>>>>>>>> READ > >>>>>>>>> context it is faster than RANDOM. The amount of time difference > >>>>>>>>> I am > >>>>>> seeing > >>>>>>>>> is close to 50%. Hence the performance question is coming up, I > >>>>>>>>> wanted > >>>>>> to > >>>>>>>>> understand is this understanding correct? > >>>>>>>>> > >>>>>>>>> Thanks > >>>>>>>>> Navneet > >>>>>>>>> > >>>>>>> -- > >>>>>>> Uwe Schindler > >>>>>>> Achterdiek 19, D-28357 Bremen > >>>>>>> https://www.thetaphi.de > >>>>>>> eMail: u...@thetaphi.de > >>>>>>> > >>>>>>> > >>>>>>> > --------------------------------------------------------------------- > >>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>>>>> > >>>>> -- > >>>>> Uwe Schindler > >>>>> Achterdiek 19, D-28357 Bremen > >>>>> https://www.thetaphi.de > >>>>> eMail: u...@thetaphi.de > >>>>> > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>>> > >>>>> > >> -- > >> Uwe Schindler > >> Achterdiek 19, D-28357 Bremen > >> https://www.thetaphi.de > >> eMail: u...@thetaphi.de > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > -- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de > eMail: u...@thetaphi.de > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >