I think the usual usage pattern is to *refresh* frequently and commit less frequently. Is there a reason you need to commit often?
You may also have overlooked this newish method: MergePolicy.findFullFlushMerges If you implement that, you can tell IndexWriter to (for example) merge multiple small segments on commit, which may be piling up given frequent commits, and if you are indexing across multiple threads. We found this can help reduce the number of segments, and the variability in the number of segments. I don't know if that is truly a root cause of your performance problems here though. Regarding scoring costs -I don't think creating dummy Weight and Scorer will do what you think - Scorers are doing matching in fact as well as scoring. You won't get any results if you don't have any real Scorer. I *think* that setting needsScores() to false should disable work done to compute relevance scores - you can confirm by looking at the scores you get back with your hits - are they all zero? Also, we did something similar in our system, and then later re-enabled scoring, and it did not add significant cost for us. YMMV, but are you sure the costs you are seeing are related to computing scores and not required for matching? -Mike On Fri, Aug 20, 2021 at 2:02 PM Varun Sharma <varun.sha...@airbnb.com.invalid> wrote: > > Hi, > > We have a large index that we divide into X lucene indices - we use lucene > 6.5.0. On each of our serving machines serves 8 lucene indices in parallel. > We are getting realtime updates to each of these 8 indices. We are seeing a > couple of things: > > a) When we turn off realtime updates, performance is significantly better. > When we turn on realtime updates, due to accumulation of segments - CPU > utilization by lucene goes up by at least *3X* [based on profiling]. > > b) A profile shows that the vast majority of time is being spent in > scoring methods even though we are setting *needsScores() to false* in our > collectors. > > We do commit our index frequently and we are roughly at ~25 segments per > index - so a total of 8 * 25 ~ 200 segments across all the 8 indices. > > Changing the number of 8 indices per machine to reduce the number of > segments is a significant effort. So, we would like to know if there are > ways to improve performance, w.r.t a) & b) > > i) We have tried some parameters with the merge policy & > NRTCachingDirectory and they did not help significantly > ii) Since we dont care about lucene level scores, is there a way to > completely disable scoring ? Should setting needsScores() to false in our > collectors do the trick ? Should we create our own dummy weight/scorer and > injecting it into the Query classes ? > > Thanks > Varun --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org