[
https://issues.apache.org/jira/browse/SOLR-17198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley resolved SOLR-17198.
---------------------------------
Fix Version/s: 9.6.0
Resolution: Fixed
Thanks for contributing!
> Affinity Placement Plugin can fail when getting metrics, if multiple replicas
> claim shard leadership
> -----------------------------------------------------------------------------------------------------
>
> Key: SOLR-17198
> URL: https://issues.apache.org/jira/browse/SOLR-17198
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Affects Versions: 9.4
> Reporter: Paul McArthur
> Priority: Minor
> Fix For: 9.6.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Using Solr 9.4 with 16 nodes, I observe that about 25% of our Split Shard
> requests are failing. The error is a RuntimeException that is raised by the
> AttributeFetcher as it compiles the metrics that will be used by the plugin.
>
> The AttributeFetcher is making /admin/metrics requests to each node, and
> currently it expects to be able to establish a consistent view of shard
> leadership across the cluster from the responses.
> However, we see this exception:
>
> {code:java}
> Caused by: java.lang.RuntimeException: two replicas claim to be the shard
> leader!
> existing=org.apache.solr.cluster.placement.impl.CollectionMetricsBuilder$ReplicaMetricsBuilder@56e219b9
> and current
> org.apache.solr.cluster.placement.impl.CollectionMetricsBuilder$ReplicaMetricsBuilder@406bcfd8
> at
> org.apache.solr.cluster.placement.impl.CollectionMetricsBuilder$ShardMetricsBuilder.lambda$build$0(CollectionMetricsBuilder.java:84)
> at java.base/java.util.HashMap.forEach(HashMap.java:1429)
> at
> org.apache.solr.cluster.placement.impl.CollectionMetricsBuilder$ShardMetricsBuilder.build(CollectionMetricsBuilder.java:76)
> at
> org.apache.solr.cluster.placement.impl.CollectionMetricsBuilder.lambda$build$0(CollectionMetricsBuilder.java:39)
> at java.base/java.util.HashMap.forEach(HashMap.java:1429)
> at
> org.apache.solr.cluster.placement.impl.CollectionMetricsBuilder.build(CollectionMetricsBuilder.java:39)
> at
> org.apache.solr.cluster.placement.impl.AttributeFetcherImpl.lambda$fetchAttributes$17(AttributeFetcherImpl.java:213)
> at java.base/java.util.HashMap.forEach(HashMap.java:1429)
> at
> org.apache.solr.cluster.placement.impl.AttributeFetcherImpl.fetchAttributes(AttributeFetcherImpl.java:212)
> at
> org.apache.solr.cluster.placement.plugins.AffinityPlacementFactory$AffinityPlacementPlugin.getBaseWeightedNodes(AffinityPlacementFactory.java:284)
> at
> org.apache.solr.cluster.placement.plugins.OrderedNodePlacementPlugin.getWeightedNodes(OrderedNodePlacementPlugin.java:311)
> at
> org.apache.solr.cluster.placement.plugins.OrderedNodePlacementPlugin.computePlacements(OrderedNodePlacementPlugin.java:85)
> at
> org.apache.solr.cluster.placement.impl.PlacementPluginAssignStrategy.assign(PlacementPluginAssignStrategy.java:84)
> at
> org.apache.solr.cloud.api.collections.Assign$AssignStrategy.assign(Assign.java:446)
> at
> org.apache.solr.cloud.api.collections.SplitShardCmd.split(SplitShardCmd.java:689)
> {code}
>
>
> This indicates that more than one replica for a given Shard has responded
> with leader=true in the replica metrics.
> I think there are legitimate reasons this can occur:
> 1. It may be fundamentally impossible to always be able to build a consistent
> view of shard leadership from querying a set of distributed nodes
> 2. /admin/metrics requests are sent sequentially to each node in turn. It is
> possible that shard leadership may change between making the request to
> different nodes that host replicas for a shard
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]