shubhamvishu commented on PR #15982: URL: https://github.com/apache/lucene/pull/15982#issuecomment-4495676673
> so indexing/merging performance shouldn’t really change @iprithv This isn't true. Flushing does impact the indexing behavior. Since this change address undercounting for byte vectors it would mean fixing that will lead to more frequent flushes due to fast buffer fills -> more disk writes(indexing blocked on IO) + smaller segments(so different merging behavior? though maybe not worse) which could potentially hurt the indexing here I think. Though doing this change is the right thing to do(as current accounting is not correct) is a separate thing but its not a simple "ram accounting fix" change (that has eventual ripple effects). > index time > main → 5.31 sec > this PR → 5.09 sec This is specifically for the byte vectors right? I think there is ~4-5% drop in indexing rate with your run(and possibly not noise as well and inline with what is expected from more freq. flushes). Could you try it with large corpus to be sure(500K or 1M). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
