[
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729551#comment-17729551
]
Rahul Goswami commented on SOLR-16838:
--------------------------------------
I ran the test to index 5 million docs (batches of 1000 docs in 15 parallel
threads). To eliminate the network overhead and get as accurate a benchmark as
possible, I used an AtomicLong to measure the time around the RTG call in
DistibutedUpdateProcessor across all calls
([https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.7.2/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java#L1416]).
Did this for both Solr 7.7.2 and Solr 8.11.1 and built the solr-core.jar to
replace it in the solr webapp lib.
RTG in Solr 8.x is ~10x slower. Here are the numbers (times are in
milliseconds):
*+Solr 7.7.2+* : 2023-06-01 15:39:48.272 WARN (qtp1034094674-24) [
x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *+Total rtg time:7293486+*
*{+}Solr 8.11.1{+}:* 2023-06-01 04:46:24.758 WARN (qtp391506011-71) [
x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *+Total rtg time:72029877+*
> Atomic updates too slow in Solr 8 vs Solr 7
> -------------------------------------------
>
> Key: SOLR-16838
> URL: https://issues.apache.org/jira/browse/SOLR-16838
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SearchComponents - other
> Affects Versions: 8.11.1
> Reporter: Rahul Goswami
> Priority: Major
>
> Started experiencing slowness with updates in production after upgrading from
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that
> indexing 20 million docs via atomic updates through the same client program
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>
> Solr 7 : 78 mins
> Solr 8: 370 mins
>
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory (since Mmap doesn't quite work well on
> Windows for our index sizes which commonly run north of 1 TB)
>
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can
> see that Solr 7 takes a different code path than Solr 8. Note that the
> performance of regular updates (non-atomic) is still pretty good on Solr 8
> completing in < 1 hour for the same 20 million data set.
>
> Sharing the indexing code, solrconfig, schema and thread dumps in the link
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]