jtibshirani commented on pull request #728: URL: https://github.com/apache/lucene/pull/728#issuecomment-1061011001
@rmuir's perspective makes total sense to me too, that we should stream to the format instead of buffering on disk within `IndexingChain`. One related thought: in a scenario with near-real time searches, this change could mean `reopen` is sometimes really slow. Say you are continuously indexing, and there is a pretty long pause in NRT searches, then you get a search and call `reopen` before executing it. This triggers a flush, meaning we build a super big graph, which can take several minutes! This is already a bit of a problem, but this change could make it worse, since we don't fill the indexing RAM buffer which would trigger intermediate flushes. Do we need to limit the number of vectors that will be buffered on disk to make sure flush isn't too slow? Or am I thinking too hard and this is not really Lucene's responsibility prevent this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org