[ 
https://issues.apache.org/jira/browse/SOLR-7332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-7332:
---------------------------------
    Attachment: SOLR-7332.patch

First pass at a patch that seeds the version buckets with the max value from 
the index as early as possible. This patch also includes the fix for SOLR-6820 
as I found the two solutions combined give the best performance. Specifically, 
with this patch, I was able to index 9,992,262 docs into a 3x2 collection in 
439 seconds (on branch5x), which is roughly 22,761 docs/sec. To compare, 
branch5x without this patch takes 758 seconds (13,182 docs/sec), the speed up 
is substantial. What's more is that the CPU load of leaders and replicas are 
very similar, previously, replicas CPU's were nearly maxed and the leader was 
only about half utilized. So with this fix, you can push Solr harder as well.

All tests pass with this patch and I added a basic unit test but would welcome 
more suggestions on how to test this better esp. edge cases I may not be aware 
of as I'm not too familiar with the version code.

The basic approach is to set the highest for all version buckets based on the 
max version value from the index. As coded, this doesn't happen until the first 
soft- or hard- commit is triggered, so there's a short window where the version 
buckets will be set to 0. The issue I found is that there isn't a searcher 
available yet when VersionInfo is first initialized. I also re-fetch the max 
after a core reload and after replaying tlogs for the core.

There's also a little more complexity for dealing with differences in how 
{{__version__}} could be configured in the schema (with or without docValues).

I'd love to get this in for 5.1 since it gives such a big improvement in 
performance but it's more important to get this right and not introduce any 
regression in this all important code path.

> Seed version buckets with max version from index
> ------------------------------------------------
>
>                 Key: SOLR-7332
>                 URL: https://issues.apache.org/jira/browse/SOLR-7332
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud
>            Reporter: Timothy Potter
>            Assignee: Timothy Potter
>         Attachments: SOLR-7332.patch
>
>
> See full discussion with Yonik and I in SOLR-6816.
> The TL;DR of that discussion is that we should initialize highest for each 
> version bucket to the MAX value of the {{__version__}} field in the index as 
> early as possible, such as after the first soft- or hard- commit. This will 
> ensure that bulk adds where the docs don't exist avoid an unnecessary lookup 
> for a non-existent document in the index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to