[ 
http://issues.apache.org/jira/browse/HADOOP-296?page=comments#action_12416109 ] 

Konstantin Shvachko commented on HADOOP-296:
--------------------------------------------

If you look further down in FSNamesystem.chooseTarget() there is code that 
selects nodes that have space 
for at least MIN_BLOCKS_FOR_WRITE (5 by default) blocks.
Then, when data nodes calculate remaining disk size (see 
FSDataset.getRemaining()) they use USABLE_DISK_PCT (98%) 
and the value of the member FSDataset.reserved, which is initially set to 0, 
and then reflects the amount of space allocated 
for the ongoing block creates. 

I think we should let individual data nodes be in control of the amount of 
space they need/want to preserve. 
Rather than enforcing it on the name node uniformly for all data nodes. 
This would solve your problem configuring very different machines on the cluster
with respect to their disk capacities.

So I propose to add 2 new configuration parameters for data nodes.
1) dfs.datanode.du.pct   which is just a configurable variant of 
USABLE_DISK_PCT.
2) dfs.datanode.du.reserved   which specifies the amount of space that should 
always remain on the node.
Then at startup FSDataset.reserved can be set to dfs.datanode.du.reserved 
rather than 0, 
and USABLE_DISK_PCT should be replaced by dfs.datanode.du.pct


> Do not assign blocks to a datanode with < x mb free
> ---------------------------------------------------
>
>          Key: HADOOP-296
>          URL: http://issues.apache.org/jira/browse/HADOOP-296
>      Project: Hadoop
>         Type: New Feature

>   Components: dfs
>     Versions: 0.3.2
>     Reporter: Johan Oskarson
>  Attachments: minspace.patch
>
> We're running a smallish cluster with very different machines, some with only 
> 60 gb harddrives
> This creates a problem when inserting files into the dfs, these machines run 
> out of space quickly and then they cannot run any map reduce operations
> A solution would be to not assign any new blocks once the space is below a 
> certain user configurable threshold
> This free space could then be used by the map reduce operations instead (if 
> that's on the same disk)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to