[ http://issues.apache.org/jira/browse/HADOOP-297?page=comments#action_12416114 ]
Konstantin Shvachko commented on HADOOP-297: -------------------------------------------- This is an important problem. But sorting for each block allocation seems too expensive, even if done n log n. Should we just permanently store a TreeMap or a PriorityQueue of nodes "prioritized" by their remaining space? This would require updating the map when DFs change, but will be less expensive for allocations. > When selecting node to put new block on, give priority to those with more > free space/less blocks > ------------------------------------------------------------------------------------------------ > > Key: HADOOP-297 > URL: http://issues.apache.org/jira/browse/HADOOP-297 > Project: Hadoop > Type: Improvement > Components: dfs > Versions: 0.3.2 > Reporter: Johan Oskarson > Priority: Minor > Attachments: priorityshuffle_v1.patch > > As mentioned in previous bug report: > We're running a smallish cluster with very different machines, some with only > 60 gb harddrives > This creates a problem when inserting files into the dfs, these machines run > out of space quickly while some have plenty of space free. > So instead of just shuffling the nodes, I've created a quick patch that first > sorts the target nodes by (freespace / blocks). > It then randomizes the position of the first third of the nodes (so we don't > put all the blocks in the file on the same machine) > I'll let you guys figure out how to improve this. > /Johan -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
