[ https://issues.apache.org/jira/browse/SOLR-7332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Timothy Potter updated SOLR-7332: --------------------------------- Attachment: SOLR-7332.patch First pass at a patch that seeds the version buckets with the max value from the index as early as possible. This patch also includes the fix for SOLR-6820 as I found the two solutions combined give the best performance. Specifically, with this patch, I was able to index 9,992,262 docs into a 3x2 collection in 439 seconds (on branch5x), which is roughly 22,761 docs/sec. To compare, branch5x without this patch takes 758 seconds (13,182 docs/sec), the speed up is substantial. What's more is that the CPU load of leaders and replicas are very similar, previously, replicas CPU's were nearly maxed and the leader was only about half utilized. So with this fix, you can push Solr harder as well. All tests pass with this patch and I added a basic unit test but would welcome more suggestions on how to test this better esp. edge cases I may not be aware of as I'm not too familiar with the version code. The basic approach is to set the highest for all version buckets based on the max version value from the index. As coded, this doesn't happen until the first soft- or hard- commit is triggered, so there's a short window where the version buckets will be set to 0. The issue I found is that there isn't a searcher available yet when VersionInfo is first initialized. I also re-fetch the max after a core reload and after replaying tlogs for the core. There's also a little more complexity for dealing with differences in how {{__version__}} could be configured in the schema (with or without docValues). I'd love to get this in for 5.1 since it gives such a big improvement in performance but it's more important to get this right and not introduce any regression in this all important code path. > Seed version buckets with max version from index > ------------------------------------------------ > > Key: SOLR-7332 > URL: https://issues.apache.org/jira/browse/SOLR-7332 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud > Reporter: Timothy Potter > Assignee: Timothy Potter > Attachments: SOLR-7332.patch > > > See full discussion with Yonik and I in SOLR-6816. > The TL;DR of that discussion is that we should initialize highest for each > version bucket to the MAX value of the {{__version__}} field in the index as > early as possible, such as after the first soft- or hard- commit. This will > ensure that bulk adds where the docs don't exist avoid an unnecessary lookup > for a non-existent document in the index. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org