[
https://issues.apache.org/jira/browse/SOLR-9320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412998#comment-15412998
]
Varun Thacker commented on SOLR-9320:
-------------------------------------
Patch looks good! Here are a few comments on that patch
Any reason why we needed to change modify
AsyncCollectionAdminRequest#setAsyncId ?
CollectionAdminRequest#DeleteNode and ReplaceNode - Can we add a little bit of
Javadocs explaining what the parameters are?
In CollectionParams, should REPLACENODE have lock level as none? What if we are
replicaing a node and someone fires a split shard request? To be correct we
need to lock right?
In OverseerCollectionMessageHandler#addReplica can't we make node and coreName
final to start with?
{code}
final String fnode = node;
final String fcoreName = coreName;
{code}
Can the tests please extend SolrCloudTestCase ? On my machine it takes one min
for RepliceNodeTest to run. I'd guess it will be a lot faster when its using
SolrCloudTestCase
The logging in this patch needs to be improved. It's too verbose and redundant.
For example these log line is so verbose and adds little value -
{code}
42134 INFO (TEST-ReplaceNodeTest.test-seed#[F960E2C81499AC90]) [ ]
o.a.s.c.ReplaceNodeTest excluded_node : 127.0.0.1:51079_x%2Ftc coll_state : {
"replicationFactor":"2",
"shards":{
"shard1":{
"range":"80000000-b332ffff",
"state":"active",
"replicas":{
"core_node1":{
"core":"replacenodetest_coll_shard1_replica2",
"base_url":"http://127.0.0.1:51075/x/tc",
"node_name":"127.0.0.1:51075_x%2Ftc",
"state":"active",
"leader":"true"},
"core_node5":{
"core":"replacenodetest_coll_shard1_replica1",
"base_url":"http://127.0.0.1:51089/x/tc",
"node_name":"127.0.0.1:51089_x%2Ftc",
"state":"active"}}},
"shard2":{
"range":"b3330000-e665ffff",
"state":"active",
"replicas":{
"core_node2":{
"core":"replacenodetest_coll_shard2_replica1",
"base_url":"http://127.0.0.1:51084/x/tc",
"node_name":"127.0.0.1:51084_x%2Ftc",
"state":"active"},
"core_node9":{
"core":"replacenodetest_coll_shard2_replica2",
"base_url":"http://127.0.0.1:51094/x/tc",
"node_name":"127.0.0.1:51094_x%2Ftc",
"state":"active",
"leader":"true"}}},
"shard3":{
"range":"e6660000-1998ffff",
"state":"active",
"replicas":{
"core_node6":{
"core":"replacenodetest_coll_shard3_replica2",
"base_url":"http://127.0.0.1:51071/x/tc",
"node_name":"127.0.0.1:51071_x%2Ftc",
"state":"active"},
"core_node8":{
"core":"replacenodetest_coll_shard3_replica1",
"base_url":"http://127.0.0.1:51065/x/tc",
"node_name":"127.0.0.1:51065_x%2Ftc",
"state":"active",
"leader":"true"}}},
"shard4":{
"range":"19990000-4ccbffff",
"state":"active",
"replicas":{
"core_node3":{
"core":"replacenodetest_coll_shard4_replica2",
"base_url":"http://127.0.0.1:51075/x/tc",
"node_name":"127.0.0.1:51075_x%2Ftc",
"state":"active",
"leader":"true"},
"core_node7":{
"core":"replacenodetest_coll_shard4_replica1",
"base_url":"http://127.0.0.1:51089/x/tc",
"node_name":"127.0.0.1:51089_x%2Ftc",
"state":"active"}}},
"shard5":{
"range":"4ccc0000-7fffffff",
"state":"active",
"replicas":{
"core_node4":{
"core":"replacenodetest_coll_shard5_replica1",
"base_url":"http://127.0.0.1:51084/x/tc",
"node_name":"127.0.0.1:51084_x%2Ftc",
"state":"active",
"leader":"true"},
"core_node10":{
"core":"replacenodetest_coll_shard5_replica2",
"base_url":"http://127.0.0.1:51094/x/tc",
"node_name":"127.0.0.1:51094_x%2Ftc",
"state":"active"}}}},
"router":{"name":"compositeId"},
"maxShardsPerNode":"3",
"autoAddReplicas":"false"}
{code}
On the redundant part:
In this patch I guess the purpose of adding logging to
OverseerCollectionMessageHandler#deleteReplica and
OverseerCollectionMessageHandler#addReplica was to be able to see when
replicanode/deletenode calles addReplica and deleteReplica.
That log line in ReplaceNodeCmd should be like - "calling addReplica for
collection=[] shards=[] on node=[]" . We shouldn't add a generic line to
OverseerCollectionMessageHandler#addReplica . That will make anyone use
AddReplica directly to see the same entry being logged twice ( the overseer
logs it automatically and second one is the new line that is added in this
patch) . Also in this patch its logged as {{log.info("going to create replica
{}", Utils.toJSONString(sourceReplica));}} which is tougher to read then the
approach mentioned above.
Same goes with the DeleteReplaceNodeCmd
The tests will fail on our slower Jenkins. We should modify the hardcoded 200
count for loop with an infinite loop. if the command doesn't complete the test
suite will fail which is fine.
> A REPLACENODE command to decommission an existing node with another new node
> ----------------------------------------------------------------------------
>
> Key: SOLR-9320
> URL: https://issues.apache.org/jira/browse/SOLR-9320
> Project: Solr
> Issue Type: Sub-task
> Components: SolrCloud
> Reporter: Noble Paul
> Attachments: DELETENODE.jpeg, REPLACENODE_After.jpeg,
> REPLACENODE_Before.jpeg, REPLACENODE_call_response.jpeg, SOLR-9320.patch,
> SOLR-9320.patch, SOLR-9320.patch, SOLR-9320.patch, SOLR-9320.patch,
> SOLR-9320.patch, SOLR-9320.patch
>
>
> The command should accept a source node and target node. recreate the
> replicas in source node in the target and do a DLETENODE of source node
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]