[ https://issues.apache.org/jira/browse/HADOOP-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553392 ]
Allen Wittenauer commented on HADOOP-2447: ------------------------------------------ Some thoughts: Tuning based upon heap size is tempting, but I can see some potential problems: a) Dependency upon a particular implementation/tuning of GC and memory handling Given the presence of multiple JREs, I'd be concerned that performance on one JRE would be radically different on another JRE simply due to how memory management is done. b) Name node heap may be swapped, but that information AFAIK isn't shared w/the Java process itself In my experiences, swap==death. We can go several minutes in gc mode while the kernel is swapping in large portions of the name node. Since the physical placement of the heap is ultimately up to the OS, I don't think the JVM has any real control over where that heap lies. Worst case scenario is that the heap was set too high to begin with and in order to recover, one has to stop both name node and data nodes, bring up just the name node, force it out of safe mode, delete files, adjust the heap size, restart name node, then bring up the data nodes. To me, that's *way* too complex for anyone but wizard hadoop'ers. If they aren't wizard, then the admin will likely have to suffer through the swappiness mess and delete files as they can. Not pretty. The flip side of this is that having a tunable for both inodes and blocks is a bit complex. One really needs to understand the usage pattern of the application in place. Ironically, an "even" distribution of files and blocks might still cause pain since the two tunables only puts safety gates on either side of the equation. But if tuned more conservatively, this could be used to prevent the most heinous of abusive users without being forced into panic mode. Ideally, any such tunable could be re-read WITHOUT restarting the name node. This way if the admin does tune too conservatively, the admin can re-tune without downtime. It might be useful to still have some sort of tool that says "based upon your current heap size, we estimate you could support up to xx files or xxx blocks". That sort of guidance would be valuable. > HDFS should be capable of limiting the total number of inodes in the system > --------------------------------------------------------------------------- > > Key: HADOOP-2447 > URL: https://issues.apache.org/jira/browse/HADOOP-2447 > Project: Hadoop > Issue Type: New Feature > Reporter: Sameer Paranjpye > Assignee: dhruba borthakur > Fix For: 0.16.0 > > Attachments: fileLimit.patch > > > The HDFS Namenode should be capable of limiting the total number of Inodes > (files + directories). The can be done through a config variable, settable in > hadoop-site.xml. The default should be no limit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.