shubhamvishu commented on PR #15982:
URL: https://github.com/apache/lucene/pull/15982#issuecomment-4495676673

   > so indexing/merging performance shouldn’t really change
   
   @iprithv This isn't true. Flushing does impact the indexing behavior. Since 
this change address undercounting for byte vectors it would mean fixing that 
will lead to more frequent flushes due to fast buffer fills -> more disk 
writes(indexing blocked on IO)  + smaller segments(so different merging 
behavior? though maybe not worse) which could potentially hurt the indexing 
here I think. 
   Though doing this change is the right thing to do(as current accounting is 
not correct) is a separate thing but its not a simple "ram accounting fix" 
change (that has eventual ripple effects).
   
   > index time
   > main → 5.31 sec
   > this PR → 5.09 sec
   
   This is specifically for the byte vectors right? I think there is ~4-5% drop 
in indexing rate with your run(and possibly not noise as well and inline with 
what is expected from more freq. flushes). Could you try it with large corpus 
to be sure(500K or 1M). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to