[
https://issues.apache.org/jira/browse/HADOOP-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hairong Kuang updated HADOOP-3810:
----------------------------------
Resolution: Fixed
Fix Version/s: (was: 0.21.0)
0.20.0
Status: Resolved (was: Patch Available)
I've committed this!
> NameNode seems unstable on a cluster with little space left
> -----------------------------------------------------------
>
> Key: HADOOP-3810
> URL: https://issues.apache.org/jira/browse/HADOOP-3810
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.17.1
> Reporter: Raghu Angadi
> Assignee: Hairong Kuang
> Fix For: 0.20.0
>
> Attachments: globalLock.patch, globalLock1.patch, simon-namenode.PNG
>
>
> NameNode seems not very responsive and unstable when the cluster has very
> little space left. The clients timeout. The main problem is that it is not
> clear to the user what is going on. Once I have more details about a NameNode
> that was in this state, I will fill in here.
> If there is not enough space left on a cluster, it is ok for clients to
> receive something like "DiskOutOfSpace" exception.
> Right now it looks like NameNode tries too hard find a node with any space
> left and ends up being slow to respond to clients. If the CPU taken by
> chooseTarger() is the main cause, there are two possible fixes :
> # chooseTarget() iterates and takes quite a bit of CPU for allocating
> datanodes. Usually this not much of a problem. It takes even more cpu when it
> needs to search multiple racks for a datanode. We could probably reduce some
> CPU for these searches. The benefit should be measurable.
> # Once NameNode can not find any datanode that has space on a rack, it could
> mark the rack as "full" and skip searching the rack for next one minute or
> so. This flag gets cleared after a minute or if any new node is added to the
> rack.
> #* Of course, this might not be optimal w.r.t disk space usage.. but only for
> a short duration. Once a cluster is mostly full, the user does expect errors.
> #* On the flip side, this fix does not require extremely CPU optimized
> version of chooseTarget().
> #* I think it is reasonable for NameNode to throw DiskOutOfSpace exception,
> even though it could have found space if it searched much more extensively.
> ---
> edit : minor changes
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.