Hi Lucene users, A quick version of my question is, why am I seeing higher performance for a multi-segment index vs a single segment index?
I have a static index that's generated before serving begins, so realtime updates and merges aren't an issue. I've been experimenting with ways to increase system performance and discovered, surprisingly, that I get higher average QPS with more than one segment. I've also discovered that the highest performing number changes with the size of the index. For example, on a 6.5G index, the optimal number was 4. For a 65G index it was 8, and for a 109G index it was about 16. The difference in average qps has been 15-35%, so it's significant. We also use an EarlyTerminatingSortingCollector with a sorted index, and I've verified that we are terminating early when appropriate. Given that Lucene searches for the requested number of hits in each segment in sequence, shouldn't the performance increase linearly with the segment count? The indexes were warmed before the tests began. I also set the heapsize large enough to not be an issue and still left plenty of space for the FS to cache the index in memory. If you have any insights, it would be appreciated. Thanks, Brandon