[jira] [Commented] (SOLR-9320) A REPLACENODE command to decommission an existing node with another new node

Varun Thacker (JIRA) Mon, 08 Aug 2016 23:07:50 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-9320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412998#comment-15412998
 ]


Varun Thacker commented on SOLR-9320:
-------------------------------------

Patch looks good!  Here are a few comments on that patch

Any reason why we needed to change modify 
AsyncCollectionAdminRequest#setAsyncId ?

CollectionAdminRequest#DeleteNode and ReplaceNode - Can we add a little bit of 
Javadocs explaining what the parameters are?

In CollectionParams, should REPLACENODE have lock level as none? What if we are 
replicaing a node and someone fires a split shard request? To be correct we 
need to lock right?

In OverseerCollectionMessageHandler#addReplica can't we make node and coreName 
final to start with?
{code}
    final String fnode = node;
    final String fcoreName = coreName;
{code}

Can the tests please extend SolrCloudTestCase ? On my machine it takes one min 
for RepliceNodeTest to run. I'd guess it will be a lot faster when its using 
SolrCloudTestCase

The logging in this patch needs to be improved. It's too verbose and redundant.

For example these log line is so verbose and adds little value -

{code}
42134 INFO  (TEST-ReplaceNodeTest.test-seed#[F960E2C81499AC90]) [    ] 
o.a.s.c.ReplaceNodeTest excluded_node : 127.0.0.1:51079_x%2Ftc  coll_state : {
  "replicationFactor":"2",
  "shards":{
    "shard1":{
      "range":"80000000-b332ffff",
      "state":"active",
      "replicas":{
        "core_node1":{
          "core":"replacenodetest_coll_shard1_replica2",
          "base_url":"http://127.0.0.1:51075/x/tc";,
          "node_name":"127.0.0.1:51075_x%2Ftc",
          "state":"active",
          "leader":"true"},
        "core_node5":{
          "core":"replacenodetest_coll_shard1_replica1",
          "base_url":"http://127.0.0.1:51089/x/tc";,
          "node_name":"127.0.0.1:51089_x%2Ftc",
          "state":"active"}}},
    "shard2":{
      "range":"b3330000-e665ffff",
      "state":"active",
      "replicas":{
        "core_node2":{
          "core":"replacenodetest_coll_shard2_replica1",
          "base_url":"http://127.0.0.1:51084/x/tc";,
          "node_name":"127.0.0.1:51084_x%2Ftc",
          "state":"active"},
        "core_node9":{
          "core":"replacenodetest_coll_shard2_replica2",
          "base_url":"http://127.0.0.1:51094/x/tc";,
          "node_name":"127.0.0.1:51094_x%2Ftc",
          "state":"active",
          "leader":"true"}}},
    "shard3":{
      "range":"e6660000-1998ffff",
      "state":"active",
      "replicas":{
        "core_node6":{
          "core":"replacenodetest_coll_shard3_replica2",
          "base_url":"http://127.0.0.1:51071/x/tc";,
          "node_name":"127.0.0.1:51071_x%2Ftc",
          "state":"active"},
        "core_node8":{
          "core":"replacenodetest_coll_shard3_replica1",
          "base_url":"http://127.0.0.1:51065/x/tc";,
          "node_name":"127.0.0.1:51065_x%2Ftc",
          "state":"active",
          "leader":"true"}}},
    "shard4":{
      "range":"19990000-4ccbffff",
      "state":"active",
      "replicas":{
        "core_node3":{
          "core":"replacenodetest_coll_shard4_replica2",
          "base_url":"http://127.0.0.1:51075/x/tc";,
          "node_name":"127.0.0.1:51075_x%2Ftc",
          "state":"active",
          "leader":"true"},
        "core_node7":{
          "core":"replacenodetest_coll_shard4_replica1",
          "base_url":"http://127.0.0.1:51089/x/tc";,
          "node_name":"127.0.0.1:51089_x%2Ftc",
          "state":"active"}}},
    "shard5":{
      "range":"4ccc0000-7fffffff",
      "state":"active",
      "replicas":{
        "core_node4":{
          "core":"replacenodetest_coll_shard5_replica1",
          "base_url":"http://127.0.0.1:51084/x/tc";,
          "node_name":"127.0.0.1:51084_x%2Ftc",
          "state":"active",
          "leader":"true"},
        "core_node10":{
          "core":"replacenodetest_coll_shard5_replica2",
          "base_url":"http://127.0.0.1:51094/x/tc";,
          "node_name":"127.0.0.1:51094_x%2Ftc",
          "state":"active"}}}},
  "router":{"name":"compositeId"},
  "maxShardsPerNode":"3",
  "autoAddReplicas":"false"}
{code}

On the redundant part:

In this patch I guess the purpose of adding logging to 
OverseerCollectionMessageHandler#deleteReplica and 
OverseerCollectionMessageHandler#addReplica was to be able to see when 
replicanode/deletenode calles addReplica and deleteReplica.

That log line in ReplaceNodeCmd should be like - "calling addReplica for 
collection=[] shards=[] on node=[]" . We shouldn't add a generic line to 
OverseerCollectionMessageHandler#addReplica . That will make anyone use 
AddReplica directly to see the same entry being logged twice ( the overseer 
logs it automatically and second one is the new line that is added in this 
patch) . Also in this patch its logged as {{log.info("going to create replica 
{}", Utils.toJSONString(sourceReplica));}} which is tougher to read then the 
approach mentioned above.

Same goes with the DeleteReplaceNodeCmd

The tests will fail on our slower Jenkins. We should modify the hardcoded 200 
count for loop with an infinite loop. if the command doesn't complete the test 
suite will fail which is fine.

> A REPLACENODE command to decommission an existing node with another new node
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-9320
>                 URL: https://issues.apache.org/jira/browse/SOLR-9320
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud
>            Reporter: Noble Paul
>         Attachments: DELETENODE.jpeg, REPLACENODE_After.jpeg, 
> REPLACENODE_Before.jpeg, REPLACENODE_call_response.jpeg, SOLR-9320.patch, 
> SOLR-9320.patch, SOLR-9320.patch, SOLR-9320.patch, SOLR-9320.patch, 
> SOLR-9320.patch, SOLR-9320.patch
>
>
> The command should accept a source node and target node. recreate the 
> replicas in source node in the target and do a DLETENODE of source node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-9320) A REPLACENODE command to decommission an existing node with another new node

Reply via email to