[ 
https://issues.apache.org/jira/browse/HDFS-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277496#comment-13277496
 ] 

Harsh J commented on HDFS-2936:
-------------------------------

Colin:

The reasons for DN getting excluded from a client are DN-side errors (network 
goes down, DN goes down), disk fill-up or xciever load fill-up causing a DN to 
remain unchosen and thereby lowering the total of choose able DNs in the 
cluster, etc.. The simplest condition to think of is: All DNs are very busy 
except two, when my minimum replication requirement for writes is 3. I can 
technically be allowed to write two replicas, and leave the rest to under 
replication handler for laters, but there's no way that allows me this today.

Eli:

I honestly think min-replication is too hard on users. People anyway write 
3-replica files with 1-min-replica today (i.e. write/close passes if only one 
replica got successfully written) and an admin should have a way to simply, 
without side-effects, enforce a minimum replication factor that just works.

But yes, the problem I've observed so far were all with 
FSDataOutputStream#close() (Sorry, not DFSClient.close(), was a quick ref.)

Nicholas,

Done. Please let me know if the current title and description is satisfactory.
                
> File close()-ing hangs indefinitely if the number of live blocks does not 
> match the minimum replication
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-2936
>                 URL: https://issues.apache.org/jira/browse/HDFS-2936
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Harsh J
>            Assignee: Harsh J
>         Attachments: HDFS-2936.patch
>
>
> If an admin wishes to enforce replication today for all the users of their 
> cluster, he may set {{dfs.namenode.replication.min}}. This property prevents 
> users from creating files with < expected replication factor.
> However, the value of minimum replication set by the above value is also 
> checked at several other points, especially during completeFile (close) 
> operations. If a condition arises wherein a write's pipeline may have gotten 
> only < minimum nodes in it, the completeFile operation does not successfully 
> close the file and the client begins to hang waiting for NN to replicate the 
> last bad block in the background. This form of hard-guarantee can, for 
> example, bring down clusters of HBase during high xceiver load on DN, or disk 
> fill-ups on many of them, etc..
> I propose we should split the property in two parts:
> * dfs.namenode.replication.min
> ** Stays the same name, but only checks file creation time replication factor 
> value and during adjustments made via setrep/etc.
> * dfs.namenode.replication.min.for.write
> ** New property that disconnects the rest of the checks from the above 
> property, such as the checks done during block commit, file complete/close, 
> safemode checks for block availability, etc..
> Alternatively, we may also choose to remove the client-side hang of 
> completeFile/close calls with a set number of retries. This would further 
> require discussion about how a file-closure handle ought to be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to