[jira] [Comment Edited] (HDFS-11412) Maintenance minimum replication config value allowable range should be {0 - DefaultReplication}

Manoj Govindassamy (JIRA) Wed, 22 Feb 2017 11:18:06 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877094#comment-15877094
 ]


Manoj Govindassamy edited comment on HDFS-11412 at 2/22/17 7:17 PM:
--------------------------------------------------------------------

Thanks for the review and detailed comments [~mingma].

(1)
bq. Regarding whether to use default replication factor or max replication 
factor,..Block A have large replication factor 30
I am assuming, this is because a file create has been done with an explicit 
replication config of 30

bq.  and would like to keep at least 20 live replicas around during 
maintenance...the system need to honor minReplicationToBeInMaintenance == 20.
True. Which means, the allowable value for minReplicationToBeInMaintenance can 
be argued to be set in the range {{0 - dfs.namenode.replication.max}}. Though 
this is a valid case, this particular range can have adverse effects as it can 
force replicate to larger number of blocks (to honor 
minReplicationToBeInMaintenance) even for the files that aren't created with 
higher replication factor. Under normal operation, the only way file could have 
lots of replicas, way larger than default replication is by explicitly setting 
the file replication param during the file creation. Also, maintenance 
operation are suppose to be for short duration only. So, to avoid 
re-replication of blocks, we can argue the value range can be {{0 - 
dfs.replication}}. Let me know your thoughts.

(2)
bq. Impact on getExpectedLiveRedundancyNum calculation. 

{noformat}
  public short getExpectedLiveRedundancyNum(BlockInfo block,
      NumberReplicas numberReplicas) {
    final short expectedRedundancy = getExpectedRedundancyNum(block);
    return (short)Math.max(expectedRedundancy -
        numberReplicas.maintenanceReplicas(),
        getMinMaintenanceStorageNum(block));
  }
{noformat} 

{{getExpectedLiveRedundancyNum()}} is used in variety of places other in 
maintenance state monitor. Just replacing this with  
*Math.min*(expectedRedundancy - numberReplicas.maintenanceReplicas(), 
getMinMaintenanceStorageNum(block)) might not be right, as 
minReplicationToBeInMaintenance = 0 is also a valid config, and that would mean 
the above expression returns the min value which is 0, which can break 
non-maintenance callers. 

So, during the normal BlockManager block validation/reconstruction, we want 
this the above expression to honor {{expectedRedundancy - 
numberReplicas.maintenanceReplicas()}}. May be we need to return the max or min 
value based on how the block replication is set compared to the default 
replication. Your thoughts please ? 




was (Author: manojg):
(1)
bq. Regarding whether to use default replication factor or max replication 
factor,..Block A have large replication factor 30
I am assuming, this is because a file create has been done with an explicit 
replication config of 30

bq.  and would like to keep at least 20 live replicas around during 
maintenance...the system need to honor minReplicationToBeInMaintenance == 20.
True. Which means, the allowable value for minReplicationToBeInMaintenance can 
be argued to be set in the range {{0 - dfs.namenode.replication.max}}. Though 
this is a valid case, this particular range can have adverse effects as it can 
force replicate to larger number of blocks (to honor 
minReplicationToBeInMaintenance) even for the files that aren't created with 
higher replication factor. Under normal operation, the only way file could have 
lots of replicas, way larger than default replication is by explicitly setting 
the file replication param during the file creation. Also, maintenance 
operation are suppose to be for short duration only. So, to avoid 
re-replication of blocks, we can argue the value range can be {{0 - 
dfs.replication}}. Let me know your thoughts.

(2)
bq. Impact on getExpectedLiveRedundancyNum calculation. 

{noformat}
  public short getExpectedLiveRedundancyNum(BlockInfo block,
      NumberReplicas numberReplicas) {
    final short expectedRedundancy = getExpectedRedundancyNum(block);
    return (short)Math.max(expectedRedundancy -
        numberReplicas.maintenanceReplicas(),
        getMinMaintenanceStorageNum(block));
  }
{noformat} 

{{getExpectedLiveRedundancyNum()}} is used in variety of places other in 
maintenance state monitor. Just replacing this with  
*Math.min*(expectedRedundancy - numberReplicas.maintenanceReplicas(), 
getMinMaintenanceStorageNum(block)) might not be right, as 
minReplicationToBeInMaintenance = 0 is also a valid config, and that would mean 
the above expression returns the min value which is 0, which can break 
non-maintenance callers. 

So, during the normal BlockManager block validation/reconstruction, we want 
this the above expression to honor {{expectedRedundancy - 
numberReplicas.maintenanceReplicas()}}. May be we need to return the max or min 
value based on how the block replication is set compared to the default 
replication. Your thoughts please ? 



> Maintenance minimum replication config value allowable range should be {0 - 
> DefaultReplication}
> -----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11412
>                 URL: https://issues.apache.org/jira/browse/HDFS-11412
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-11412.01.patch
>
>
> Currently the allowed value range for Maintenance Min Replication 
> {{dfs.namenode.maintenance.replication.min}} is 0 to 
> {{dfs.namenode.replication.min}} (default=1). Users wanting not to affect the 
> performance of the cluster would wish to have the Maintenance Min Replication 
> number greater than 1, say 2. In the current design, it is possible to have 
> this Maintenance Min Replication configuration, but only after changing the 
> NameNode level Block Min Replication to 2, and which could slowdown the 
> overall latency for client writes.
> Technically speaking we should be allowing Maintenance Min Replication to be 
> in range 0 to dfs.replication.max.  
> * There is always config value of 0 for users not wanting any 
> availability/performance during maintenance. 
> * And, performance centric workloads can still get maintenance done without 
> major disruptions by having a bigger Maintenance Min Replication. Setting the 
> upper limit as dfs.replication.max could be an overkill as it could trigger 
> re-replication which Maintenance State is trying to avoid. So, we could allow 
> the {{dfs.namenode.maintenance.replication.min}} in the range {{0 to 
> dfs.replication}}
> {noformat}
>     if (minMaintenanceR < 0) {
>       throw new IOException("Unexpected configuration parameters: "
>           + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY
>           + " = " + minMaintenanceR + " < 0");
>     }
>     if (minMaintenanceR > minR) {
>       throw new IOException("Unexpected configuration parameters: "
>           + DFSConfigKeys.DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY
>           + " = " + minMaintenanceR + " > "
>           + DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY
>           + " = " + minR);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-11412) Maintenance minimum replication config value allowable range should be {0 - DefaultReplication}

Reply via email to