mccullocht commented on PR #15722: URL: https://github.com/apache/lucene/pull/15722#issuecomment-3930695689
>> The currently implementation does a synchronous madvise() syscall which I would expect to harm performance when everything is in memory and may be worse than doing a read when the data has spilled to storage. > @mccullocht it is async. We make the call and then immediately return. Is your comment mainly that the call is in sync path rather than in a separate thread? Syscalls are relatively expensive. If the data is already in memory, adding a syscall for every vector is going to be noticeable. I believe this is what Ben means when he says it will harm the "preferred path". I'm not sure if it would make a difference to run the syscall in the background. In any case I'm guessing you'll see this in benchmarks when the data set is small enough to fit in RAM. > I think we can leave that to the application, and if we provide an implementation like PrefetchableScorer of RandomScorer, then during search an application can choose to pick a scorer based as per their needs. Is that something we can consider? Indicating this preference has to fit through the interface in `KnnVectorsReader.search()` to override or wrap the vector scorer. I don't have any ideas that are really clean or obvious. Maybe the collector could contain a function that wraps the scorer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
