[ https://issues.apache.org/jira/browse/SOLR-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834405#comment-16834405 ]
Shalin Shekhar Mangar commented on SOLR-13445: ---------------------------------------------- Thanks Dat. A few comments: # Minor nit: Rename HttpShardHandlerFactory#sameMetric to hasSameMetric # Can you do an exponential backoff in NodesSysPropsCacher#fetchRemoteProps? # The RoutingToNodesWithPropertiesTest needs a better check than comparing shardAddress. The reason is that shardAddress is set by the GET_TOP_IDS phase but not by other phases such as GET_FIELDS. Use TrackingShardHandlerFactory instead. # I agree with Andrzej that the fix to SolrClientNodeStateProvider should go to a separate issue so that it can be backported to 7_7 if needed. > Preferred replicas on nodes with same system properties as the query master > --------------------------------------------------------------------------- > > Key: SOLR-13445 > URL: https://issues.apache.org/jira/browse/SOLR-13445 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Cao Manh Dat > Assignee: Cao Manh Dat > Priority: Major > Attachments: SOLR-13445.patch > > > Currently, Solr chooses a random replica for each shard to fan out the query > request. However, this presents a problem when running Solr in multiple > availability zones. > If one availability zone fails then it affects all Solr nodes because they > will try to connect to Solr nodes in the failed availability zone until the > request times out. This can lead to a build up of threads on each Solr node > until the node goes out of memory. This results in a cascading failure. > This issue try to solve this problem by adding > * another shardPreference param named {{node.sysprop}}, so the query will be > routed to nodes with same defined system properties as the current one. > * default shardPreferences on the whole cluster, which will be stored in > {{/clusterprops.json}}. > * a cacher for fetching other nodes system properties whenever /live_nodes > get changed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org