mccullocht commented on PR #15722:
URL: https://github.com/apache/lucene/pull/15722#issuecomment-3930695689

   >> The currently implementation does a synchronous madvise() syscall which I 
would expect to harm performance when everything is in memory and may be worse 
than doing a read when the data has spilled to storage.
   
   > @mccullocht it is async. We make the call and then immediately return. Is 
your comment mainly that the call is in sync path rather than in a separate 
thread?
   
   Syscalls are relatively expensive. If the data is already in memory, adding 
a syscall for every vector is going to be noticeable. I believe this is what 
Ben means when he says it will harm the "preferred path". I'm not sure if it 
would make a difference to run the syscall in the background. In any case I'm 
guessing you'll see this in benchmarks when the data set is small enough to fit 
in RAM.
   
   > I think we can leave that to the application, and if we provide an 
implementation like PrefetchableScorer of RandomScorer, then during search an 
application can choose to pick a scorer based as per their needs. Is that 
something we can consider?
   
   Indicating this preference has to fit through the interface in 
`KnnVectorsReader.search()` to override or wrap the vector scorer. I don't have 
any ideas that are really clean or obvious. Maybe the collector could contain a 
function that wraps the scorer?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to