[ 
https://issues.apache.org/jira/browse/HDFS-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681832#comment-14681832
 ] 

Kihwal Lee commented on HDFS-8863:
----------------------------------

bq.  it should just return current storage remaining space instead of get the 
maximum remaining space of all storages
Datanodes only care about the storage type, so checking a particular 
storagewon't do any good. It will just cause block placement to re-pick target 
more.

bq. Another issue, getBlocksScheduled is for storage type, not for per storage.
Tracking scheduled writes per storage is not going to solve the problem since 
datanodes are free to choose any storage as long as the type matches. Trying to 
achieve precise accounting will have diminishing return as there are 
uncertainties around actual storage being used, blocks being abandoned, control 
loop delays (heartbeats), etc.

What if we let it check against storage type level sum and also make sure there 
is at least one storage with enough space?  I actually had a version of patch 
that does just that.  I will remove unused method and post the patch.

> The remiaing space check in BlockPlacementPolicyDefault is flawed
> -----------------------------------------------------------------
>
>                 Key: HDFS-8863
>                 URL: https://issues.apache.org/jira/browse/HDFS-8863
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>              Labels: 2.6.1-candidate
>         Attachments: HDFS-8863.patch
>
>
> The block placement policy calls 
> {{DatanodeDescriptor#getRemaining(StorageType}}}} to check whether the block 
> is going to fit. Since the method is adding up all remaining spaces, namenode 
> can allocate a new block on a full node. This causes pipeline construction 
> failure and {{abandonBlock}}. If the cluster is nearly full, the client might 
> hit this multiple times and the write can fail permanently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to