[ https://issues.apache.org/jira/browse/SOLR-17756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950518#comment-17950518 ]
Matthew Biscocho edited comment on SOLR-17756 at 5/9/25 1:34 PM: ----------------------------------------------------------------- Doing some testing of the PR above, I got some numbers. FYI my machine has 12 cores. I created a single core in Solr and indexed ~118 million docs with only an ID which created 58 segments. 2 segments had 31 million documents. I invalidated the fingerprint cache for my tests as well. Sequentially (Original) - ~631 ms {code:java} 2025-05-08 22:23:32.798 INFO (qtp436094532-32-localhost-7) [c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-7] o.a.s.u.IndexFingerprint IndexFingerprint millis:631.0 result:{maxVersionSpecified=9223372036854775807, maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, maxDoc=31554300} 2025-05-08 22:23:34.515 INFO (qtp436094532-38-localhost-8) [c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-8] o.a.s.u.IndexFingerprint IndexFingerprint millis:665.0 result:{maxVersionSpecified=9223372036854775807, maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, maxDoc=31554300}{code} Parallel (12 Cores) - ~249 ms {code:java} 2025-05-08 22:19:51.563 INFO (qtp436094532-204-localhost-13345662) [c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-13345662] o.a.s.u.IndexFingerprint IndexFingerprint millis:249.0 result:{maxVersionSpecified=9223372036854775807, maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, maxDoc=31554300} 2025-05-08 22:19:52.304 INFO (qtp436094532-260-localhost-13345663) [c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-13345663] o.a.s.u.IndexFingerprint IndexFingerprint millis:249.0 result:{maxVersionSpecified=9223372036854775807, maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, maxDoc=31554300}{code} So there is definitely some improvement here but I'd be curious to see how much of an improvement on a much larger documents and more segments. In a real life scenario with a fingerprint cache on some of the older untouched segments it might only be going over the new smaller segments this should help. was (Author: JIRAUSER309589): Doing some testing of the PR above, I got some numbers. FYI my machine has 12 cores. I created a single core in Solr and indexed ~118 million docs with only an ID which created 58 segments. 2 segments had 31 million documents. I invalidated the fingerprint cache for my tests as well. Sequentially (Original) - ~631 ms 2025-05-08 22:23:32.798 INFO (qtp436094532-32-localhost-7) [c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-7] o.a.s.u.IndexFingerprint IndexFingerprint millis:631.0 result:\{maxVersionSpecified=9223372036854775807, maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, maxDoc=31554300} 2025-05-08 22:23:34.515 INFO (qtp436094532-38-localhost-8) [c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-8] o.a.s.u.IndexFingerprint IndexFingerprint millis:665.0 result:\{maxVersionSpecified=9223372036854775807, maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, maxDoc=31554300} Parallel (12 Cores) - ~249 ms 2025-05-08 22:19:51.563 INFO (qtp436094532-204-localhost-13345662) [c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-13345662] o.a.s.u.IndexFingerprint IndexFingerprint millis:249.0 result:\{maxVersionSpecified=9223372036854775807, maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, maxDoc=31554300} 2025-05-08 22:19:52.304 INFO (qtp436094532-260-localhost-13345663) [c:gettingstarted s:shard1 r:core_node2 x:gettingstarted_shard1_replica_n1 t:localhost-13345663] o.a.s.u.IndexFingerprint IndexFingerprint millis:249.0 result:\{maxVersionSpecified=9223372036854775807, maxVersionEncountered=1831592552090828800, maxInHash=1831592552090828800, versionsHash=6472754633150858610, numVersions=118657846, numDocs=118657846, maxDoc=31554300} So there is definitely some improvement here but I'd be curious to see how much of an improvement on a much larger documents and more segments. In a real life scenario with a fingerprint cache on some of the older untouched segments it might only be going over the new smaller segments this should help. > Parallelize calculation of index fingerprint across segments > ------------------------------------------------------------ > > Key: SOLR-17756 > URL: https://issues.apache.org/jira/browse/SOLR-17756 > Project: Solr > Issue Type: Improvement > Affects Versions: main (10.0), 8.11.4, 9.8.1 > Reporter: Matthew Biscocho > Assignee: Matthew Biscocho > Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > The index fingerprint is currently being calculated on each segment > sequentially. While this works fine, the index fingerprint calculation was > noticed to be a very slow process and on leader election is blocking. > This proposes to have this calculation parallelized across segments instead. > Since the fingerprint is just a cumulative sum of a hash on versions, the > order in which it is added to the running sum should not matter. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org