[ 
https://issues.apache.org/jira/browse/SOLR-17756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950518#comment-17950518
 ] 

Matthew Biscocho edited comment on SOLR-17756 at 5/9/25 1:34 PM:
-----------------------------------------------------------------

Doing some testing of the PR above, I got some numbers. FYI my machine has 12 
cores. I created a single core in Solr and indexed ~118 million docs with only 
an ID which created 58 segments. 2 segments had 31 million documents. I 
invalidated the fingerprint cache for my tests as well.

Sequentially (Original) - ~631 ms
{code:java}
2025-05-08 22:23:32.798 INFO (qtp436094532-32-localhost-7) [c:gettingstarted 
s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-7] 
o.a.s.u.IndexFingerprint IndexFingerprint millis:631.0 
result:{maxVersionSpecified=9223372036854775807, 
maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, 
versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, 
maxDoc=31554300}
2025-05-08 22:23:34.515 INFO (qtp436094532-38-localhost-8) [c:gettingstarted 
s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-8] 
o.a.s.u.IndexFingerprint IndexFingerprint millis:665.0 
result:{maxVersionSpecified=9223372036854775807, 
maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, 
versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, 
maxDoc=31554300}{code}
Parallel (12 Cores) - ~249 ms
{code:java}
2025-05-08 22:19:51.563 INFO (qtp436094532-204-localhost-13345662) 
[c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 
t:localhost-13345662] o.a.s.u.IndexFingerprint IndexFingerprint millis:249.0 
result:{maxVersionSpecified=9223372036854775807, 
maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, 
versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, 
maxDoc=31554300}
2025-05-08 22:19:52.304 INFO (qtp436094532-260-localhost-13345663) 
[c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 
t:localhost-13345663] o.a.s.u.IndexFingerprint IndexFingerprint millis:249.0 
result:{maxVersionSpecified=9223372036854775807, 
maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, 
versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, 
maxDoc=31554300}{code}

 
So there is definitely some improvement here but I'd be curious to see how much 
of an improvement on a much larger documents and more segments. In a real life 
scenario with a fingerprint cache on some of the older untouched segments it 
might only be going over the new smaller segments this should help.


was (Author: JIRAUSER309589):
Doing some testing of the PR above, I got some numbers. FYI my machine has 12 
cores. I created a single core in Solr and indexed ~118 million docs with only 
an ID which created 58 segments. 2 segments had 31 million documents. I 
invalidated the fingerprint cache for my tests as well.

Sequentially (Original) - ~631 ms
2025-05-08 22:23:32.798 INFO  (qtp436094532-32-localhost-7) [c:gettingstarted 
s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-7] 
o.a.s.u.IndexFingerprint IndexFingerprint millis:631.0 
result:\{maxVersionSpecified=9223372036854775807, 
maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, 
versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, 
maxDoc=31554300}
2025-05-08 22:23:34.515 INFO  (qtp436094532-38-localhost-8) [c:gettingstarted 
s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-8] 
o.a.s.u.IndexFingerprint IndexFingerprint millis:665.0 
result:\{maxVersionSpecified=9223372036854775807, 
maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, 
versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, 
maxDoc=31554300}



Parallel (12 Cores) - ~249 ms
2025-05-08 22:19:51.563 INFO  (qtp436094532-204-localhost-13345662) 
[c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 
t:localhost-13345662] o.a.s.u.IndexFingerprint IndexFingerprint millis:249.0 
result:\{maxVersionSpecified=9223372036854775807, 
maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, 
versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, 
maxDoc=31554300}
2025-05-08 22:19:52.304 INFO  (qtp436094532-260-localhost-13345663) 
[c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 
t:localhost-13345663] o.a.s.u.IndexFingerprint IndexFingerprint millis:249.0 
result:\{maxVersionSpecified=9223372036854775807, 
maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, 
versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, 
maxDoc=31554300}
 
So there is definitely some improvement here but I'd be curious to see how much 
of an improvement on a much larger documents and more segments. In a real life 
scenario with a fingerprint cache on some of the older untouched segments it 
might only be going over the new smaller segments this should help.

> Parallelize calculation of index fingerprint across segments
> ------------------------------------------------------------
>
>                 Key: SOLR-17756
>                 URL: https://issues.apache.org/jira/browse/SOLR-17756
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: main (10.0), 8.11.4, 9.8.1
>            Reporter: Matthew Biscocho
>            Assignee: Matthew Biscocho
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> The index fingerprint is currently being calculated on each segment 
> sequentially. While this works fine, the index fingerprint calculation was 
> noticed to be a very slow process and on leader election is blocking.
> This proposes to have this calculation parallelized across segments instead. 
> Since the fingerprint is just a cumulative sum of a hash on versions, the 
> order in which it is added to the running sum should not matter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to