Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "TestFaqPage" page has been changed by SomeOtherAccount. http://wiki.apache.org/hadoop/TestFaqPage?action=diff&rev1=3&rev2=4 -------------------------------------------------- = HDFS = - <<BR>> <<Anchor(3.1)>> '''1. [[#A3.1|If I add new data-nodes to the cluster will HDFS move the blocks to the newly added nodes in order to balance disk space utilization between the nodes?]]''' + == If I add new DataNodes to the cluster will HDFS move the blocks to the newly added nodes in order to balance disk space utilization between the nodes? == No, HDFS will not move blocks to new nodes automatically. However, newly created files will likely have their blocks placed on the new nodes. @@ -193, +193 @@ * [[http://developer.yahoo.com/hadoop/tutorial/module2.html#rebalancing|HDFS Tutorial: Rebalancing]]; * [[http://hadoop.apache.org/core/docs/current/commands_manual.html#balancer|HDFS Commands Guide: balancer]]. - <<BR>> <<Anchor(3.2)>> '''2. [[#A3.2|What is the purpose of the secondary name-node?]]''' + == What is the purpose of the secondary name-node? == The term "secondary name-node" is somewhat misleading. It is not a name-node in the sense that data-nodes cannot connect to the secondary name-node, and in no event it can replace the primary name-node in case of its failure. @@ -201, +201 @@ So if the name-node fails and you can restart it on the same physical node then there is no need to shutdown data-nodes, just the name-node need to be restarted. If you cannot use the old node anymore you will need to copy the latest image somewhere else. The latest image can be found either on the node that used to be the primary before failure if available; or on the secondary name-node. The latter will be the latest checkpoint without subsequent edits logs, that is the most recent name space modifications may be missing there. You will also need to restart the whole cluster in this case. - <<BR>> <<Anchor(3.3)>> '''3. [[#A3.3|Does the name-node stay in safe mode till all under-replicated files are fully replicated?]]''' + == Does the name-node stay in safe mode till all under-replicated files are fully replicated? == No. During safe mode replication of blocks is prohibited. The name-node awaits when all or majority of data-nodes report their blocks. @@ -211, +211 @@ Learn more about safe mode [[http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Safemode|in the HDFS Users' Guide]]. - <<BR>> <<Anchor(3.4)>> '''4. [[#A3.4|How do I set up a hadoop node to use multiple volumes?]]''' + == How do I set up a hadoop node to use multiple volumes? == ''Data-nodes'' can store blocks in multiple directories typically allocated on different local disk drives. In order to setup multiple directories one needs to specify a comma separated list of pathnames as a value of the configuration parameter [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.data.dir|dfs.data.dir]]. Data-nodes will attempt to place equal amount of data in each of the directories. The ''name-node'' also supports multiple directories, which in the case store the name space image and the edits log. The directories are specified via the [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.name.dir|dfs.name.dir]] configuration parameter. The name-node directories are used for the name space data replication so that the image and the log could be restored from the remaining volumes if one of them fails. - <<BR>> <<Anchor(3.5)>> '''5. [[#A3.5|What happens if one Hadoop client renames a file or a directory containing this file while another client is still writing into it?]]''' + == What happens if one Hadoop client renames a file or a directory containing this file while another client is still writing into it? == Starting with release hadoop-0.15, a file will appear in the name space as soon as it is created. If a writer is writing to a file and another client renames either the file itself or any of its path components, then the original writer will get an IOException either when it finishes writing to the current block or when it closes the file. - <<BR>> <<Anchor(3.6)>> '''6. [[#A3.6|I want to make a large cluster smaller by taking out a bunch of nodes simultaneously. How can this be done?]]''' + == I want to make a large cluster smaller by taking out a bunch of nodes simultaneously. How can this be done? == On a large cluster removing one or two data-nodes will not lead to any data loss, because name-node will replicate their blocks as long as it will detect that the nodes are dead. With a large number of nodes getting removed or dying the probability of losing data is higher. @@ -236, +236 @@ The decommission process can be terminated at any time by editing the configuration or the exclude files and repeating the {{{-refreshNodes}}} command. - <<BR>> <<Anchor(3.7)>> '''7. [[#A3.7|Wildcard characters doesn't work correctly in FsShell.]]''' + == Wildcard characters doesn't work correctly in FsShell. == When you issue a command in !FsShell, you may want to apply that command to more than one file. !FsShell provides a wildcard character to help you do so. The * (asterisk) character can be used to take the place of any set of characters. For example, if you would like to list all the files in your account which begin with the letter '''x''', you could use the ls command with the * wildcard:
