[ 
https://issues.apache.org/jira/browse/HDFS-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2936:
--------------------------
    Description: 
If an admin wishes to enforce replication today for all the users of their 
cluster, they may set {{dfs.namenode.replication.min}}. This property prevents 
users from creating files with < expected replication factor.

However, the value of minimum replication set by the above value is also 
checked at several other points, especially during completeFile (close) 
operations. If a condition arises wherein a write's pipeline may have gotten 
only < minimum nodes in it, the completeFile operation does not successfully 
close the file and the client begins to hang waiting for NN to replicate the 
last bad block in the background. This form of hard-guarantee can, for example, 
bring down clusters of HBase during high xceiver load on DN, or disk fill-ups 
on many of them, etc..

I propose we should split the property in two parts:
* dfs.namenode.replication.min
** Stays the same name, but only checks file creation time replication factor 
value and during adjustments made via setrep/etc.
* dfs.namenode.replication.min.for.write
** New property that disconnects the rest of the checks from the above 
property, such as the checks done during block commit, file complete/close, 
safemode checks for block availability, etc..

Alternatively, we may also choose to remove the client-side hang of 
completeFile/close calls with a set number of retries. This would further 
require discussion about how a file-closure handle ought to be handled.

  was:
If an admin wishes to enforce replication today for all the users of their 
cluster, he may set {{dfs.namenode.replication.min}}. This property prevents 
users from creating files with < expected replication factor.

However, the value of minimum replication set by the above value is also 
checked at several other points, especially during completeFile (close) 
operations. If a condition arises wherein a write's pipeline may have gotten 
only < minimum nodes in it, the completeFile operation does not successfully 
close the file and the client begins to hang waiting for NN to replicate the 
last bad block in the background. This form of hard-guarantee can, for example, 
bring down clusters of HBase during high xceiver load on DN, or disk fill-ups 
on many of them, etc..

I propose we should split the property in two parts:
* dfs.namenode.replication.min
** Stays the same name, but only checks file creation time replication factor 
value and during adjustments made via setrep/etc.
* dfs.namenode.replication.min.for.write
** New property that disconnects the rest of the checks from the above 
property, such as the checks done during block commit, file complete/close, 
safemode checks for block availability, etc..

Alternatively, we may also choose to remove the client-side hang of 
completeFile/close calls with a set number of retries. This would further 
require discussion about how a file-closure handle ought to be handled.


> Provide a way to apply a minimum replication factor aside of strict minimum 
> live replicas feature
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-2936
>                 URL: https://issues.apache.org/jira/browse/HDFS-2936
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 0.23.0
>            Reporter: Harsh J
>         Attachments: HDFS-2936.patch
>
>
> If an admin wishes to enforce replication today for all the users of their 
> cluster, they may set {{dfs.namenode.replication.min}}. This property 
> prevents users from creating files with < expected replication factor.
> However, the value of minimum replication set by the above value is also 
> checked at several other points, especially during completeFile (close) 
> operations. If a condition arises wherein a write's pipeline may have gotten 
> only < minimum nodes in it, the completeFile operation does not successfully 
> close the file and the client begins to hang waiting for NN to replicate the 
> last bad block in the background. This form of hard-guarantee can, for 
> example, bring down clusters of HBase during high xceiver load on DN, or disk 
> fill-ups on many of them, etc..
> I propose we should split the property in two parts:
> * dfs.namenode.replication.min
> ** Stays the same name, but only checks file creation time replication factor 
> value and during adjustments made via setrep/etc.
> * dfs.namenode.replication.min.for.write
> ** New property that disconnects the rest of the checks from the above 
> property, such as the checks done during block commit, file complete/close, 
> safemode checks for block availability, etc..
> Alternatively, we may also choose to remove the client-side hang of 
> completeFile/close calls with a set number of retries. This would further 
> require discussion about how a file-closure handle ought to be handled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to