[GitHub] [lucene] jtibshirani commented on pull request #728: LUCENE-10194 Buffer KNN vectors on disk

GitBox Mon, 07 Mar 2022 10:44:47 -0800


jtibshirani commented on pull request #728:
URL: https://github.com/apache/lucene/pull/728#issuecomment-1061011001



   @rmuir's perspective makes total sense to me too, that we should stream to 
the format instead of buffering on disk within `IndexingChain`.
   
   One related thought: in a scenario with near-real time searches, this change 
could mean `reopen` is sometimes really slow. Say you are continuously 
indexing, and there is a pretty long pause in NRT searches, then you get a 
search and call `reopen` before executing it. This triggers a flush, meaning we 
build a super big graph, which can take several minutes! This is already a bit 
of a problem, but this change could make it worse, since we don't fill the 
indexing RAM buffer which would trigger intermediate flushes. Do we need to 
limit the number of vectors that will be buffered on disk to make sure flush 
isn't too slow? Or am I thinking too hard and this is not really Lucene's 
responsibility prevent this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on pull request #728: LUCENE-10194 Buffer KNN vectors on disk

Reply via email to