[Hadoop Wiki] Update of "FAQ" by SomeOtherAccount

Apache Wiki Wed, 06 Oct 2010 15:18:40 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "FAQ" page has been changed by SomeOtherAccount.
http://wiki.apache.org/hadoop/FAQ?action=diff&rev1=75&rev2=76

--------------------------------------------------

- = Hadoop FAQ =
- <<BR>> <<Anchor(1)>> '''1. [[#A1|What is Hadoop?]]'''
+ #pragma section-numbers on
+ 
+ '''Hadoop FAQ'''
+ 
+ <<TableOfContents(3)>>
+ 
+ = General =
+ 
+ == What is Hadoop? ==
  
  [[http://hadoop.apache.org/core/|Hadoop]] is a distributed computing platform 
written in Java.  It incorporates features similar to those of the 
[[http://en.wikipedia.org/wiki/Google_File_System|Google File System]] and of 
[[http://en.wikipedia.org/wiki/MapReduce|MapReduce]].  For some details, see 
HadoopMapReduce.
  
- <<BR>> <<Anchor(2)>> '''2. [[#A2|What platform does Hadoop run on?]]'''
+ == What platform does Hadoop run on? ==
  
   1. Java 1.6.x or higher, preferably from Sun -see HadoopJavaVersions
   1. Linux and Windows are the supported operating systems, but BSD, Mac OS/X, 
and OpenSolaris are known to work. (Windows requires the installation of 
[[http://www.cygwin.com/|Cygwin]]).
  
+ == How well does Hadoop scale? ==
- <<Anchor(2.1)>> ''2.1 [[#A2.1|Building / Testing Hadoop on Windows]]''
- 
- The Hadoop build on Windows can be run from inside a Windows (not cygwin) 
command prompt window.
- 
- Whether you set environment variables in a batch file or in 
System->Properties->Advanced->Environment Variables, the following environment 
variables need to be set:
- 
- {{{
- set ANT_HOME=c:\apache-ant-1.7.1
- set JAVA_HOME=c:\jdk1.6.0.4
- set PATH=%PATH%;%ANT_HOME%\bin
- }}}
- then open a command prompt window, cd to your workspace directory (in my case 
it is c:\workspace\hadoop) and run ant. Since I am interested in running the 
contrib test cases I do the following:
- 
- {{{
- ant -l build.log -Dtest.output=yes test-contrib
- }}}
- other targets work similarly. I just wanted to document this because I spent 
some time trying to figure out why the ant build would not run from a cygwin 
command prompt window. If you are building/testing on Windows, and haven't 
figured it out yet, this should get you started.
- 
- <<BR>> <<Anchor(3)>> '''3. [[#A3|How well does Hadoop scale?]]'''
  
  Hadoop has been demonstrated on clusters of up to 4000 nodes.  Sort 
performance on 900 nodes is good (sorting 9TB of data on 900 nodes takes around 
1.8 hours) and [[attachment:sort900-20080115.png|improving]] using these 
non-default configuration values:
  
@@ -48, +37 @@

   * `tasktracker.http.threads = 50`
   * `mapred.child.java.opts = -Xmx1024m`
  
- <<BR>> <<Anchor(4)>> '''4. [[#A4|Do I have to write my application in 
Java?]]'''
+ == What kind of hardware scales best for Hadoop? ==
+ 
+ The short answer is dual processor/dual core machines with 4-8GB of RAM using 
ECC memory. Machines should be moderately high-end commodity machines to be 
most cost-effective and typically cost 1/2 - 2/3 the cost of normal production 
application servers but are not desktop-class machines. This cost tends to be 
$2-5K. For a more detailed discussion, see MachineScaling page.
+ 
+ == How does GridGain compare to Hadoop? ==
+ 
+ !GridGain does not support data intensive jobs. For more details, see 
HadoopVsGridGain.
+ 
+ == I have a new node I want to add to a running Hadoop cluster; how do I 
start services on just one node? ==
+ 
+ This also applies to the case where a machine has crashed and rebooted, etc, 
and you need to get it to rejoin the cluster. You do not need to shutdown 
and/or restart the entire cluster in this case.
+ 
+ First, add the new node's DNS name to the conf/slaves file on the master node.
+ 
+ Then log in to the new slave node and execute:
+ 
+ {{{
+ $ cd path/to/hadoop
+ $ bin/hadoop-daemon.sh start datanode
+ $ bin/hadoop-daemon.sh start tasktracker
+ }}}
+ 
+ == Is there an easy way to see the status and health of my cluster? ==
+ 
+ There are web-based interfaces to both the JobTracker (MapReduce master) and 
NameNode (HDFS master) which display status pages about the state of the entire 
system. By default, these are located at http://job.tracker.addr:50030/ and 
http://name.node.addr:50070/.
+ 
+ The JobTracker status page will display the state of all nodes, as well as 
the job queue and status about all currently running jobs and tasks. The 
NameNode status page will display the state of all nodes and the amount of free 
space, and provides the ability to browse the DFS via the web.
+ 
+ You can also see some basic HDFS cluster health data by running:
+ 
+ {{{
+ $ bin/hadoop dfsadmin -report
+ }}}
+ 
+ == How much network bandwidth might I need between racks in a medium size 
(40-80 node) Hadoop cluster? ==
+ 
+ The true answer depends on the types of jobs you're running. As a back of the 
envelope calculation one might figure something like this:
+ 
+ 60 nodes total on 2 racks = 30 nodes per rack Each node might process about 
100MB/sec of data In the case of a sort job where the intermediate data is the 
same size as the input data, that means each node needs to shuffle 100MB/sec of 
data In aggregate, each rack is then producing about 3GB/sec of data However, 
given even reducer spread across the racks, each rack will need to send 
1.5GB/sec to reducers running on the other rack. Since the connection is full 
duplex, that means you need 1.5GB/sec of bisection bandwidth for this 
theoretical job. So that's 12Gbps.
+ 
+ However, the above calculations are probably somewhat of an upper bound. A 
large number of jobs have significant data reduction during the map phase, 
either by some kind of filtering/selection going on in the Mapper itself, or by 
good usage of Combiners. Additionally, intermediate data compression can cut 
the intermediate data transfer by a significant factor. Lastly, although your 
disks can probably provide 100MB sustained throughput, it's rare to see a MR 
job which can sustain disk speed IO through the entire pipeline. So, I'd say my 
estimate is at least a factor of 2 too high.
+ 
+ So, the simple answer is that 4-6Gbps is most likely just fine for most 
practical jobs. If you want to be extra safe, many inexpensive switches can 
operate in a "stacked" configuration where the bandwidth between them is 
essentially backplane speed. That should scale you to 96 nodes with plenty of 
headroom. Many inexpensive gigabit switches also have one or two 10GigE ports 
which can be used effectively to connect to each other or to a 10GE core.
+ 
+ == How can I help to make Hadoop better? ==
+ 
+ If you have trouble figuring how to use Hadoop, then, once you've figured 
something out (perhaps with the help of the 
[[http://hadoop.apache.org/core/mailing_lists.html|mailing lists]]), pass that 
knowledge on to others by adding something to this wiki.
+ 
+ If you find something that you wish were done better, and know how to fix it, 
read HowToContribute, and contribute a patch.
+ 
+ = MapReduce =
+ 
+ == Do I have to write my application in Java? ==
  
  No.  There are several ways to incorporate non-Java code.
  
@@ -56, +97 @@

   * 
[[http://svn.apache.org/viewvc/hadoop/core/trunk/src/c++/libhdfs|libhdfs]], a 
JNI-based C API for talking to hdfs (only).
   * 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html|Hadoop
 Pipes]], a [[http://www.swig.org/|SWIG]]-compatible  C++ API (non-JNI) to 
write map-reduce jobs.
  
- <<BR>> <<Anchor(5)>> '''5. [[#A5|How can I help to make Hadoop better?]]'''
+ == What is the Distributed Cache used for? ==
  
- If you have trouble figuring how to use Hadoop, then, once you've figured 
something out (perhaps with the help of the 
[[http://hadoop.apache.org/core/mailing_lists.html|mailing lists]]), pass that 
knowledge on to others by adding something to this wiki.
+ The distributed cache is used to distribute large read-only files that are 
needed by map/reduce jobs to the cluster. The framework will copy the necessary 
files from a url (either hdfs: or http:) on to the slave node before any tasks 
for the job are executed on that node. The files are only copied once per job 
and so should not be modified by the application.
  
- If you find something that you wish were done better, and know how to fix it, 
read HowToContribute, and contribute a patch.
+ == Can I write create/write-to hdfs files directly from my map/reduce tasks? 
==
  
+ Yes. (Clearly, you want this since you need to create/write-to files other 
than the output-file written out by 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/OutputCollector.html|OutputCollector]].)
+ 
+ Caveats:
+ 
+ ${mapred.output.dir} is the eventual output directory for the job 
([[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#setOutputPath(org.apache.hadoop.fs.Path)|JobConf.setOutputPath]]
 / 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#getOutputPath()|JobConf.getOutputPath]]).
+ 
+ ${taskid} is the actual id of the individual task-attempt (e.g. 
task_200709221812_0001_m_000000_0), a TIP is a bunch of ${taskid}s (e.g. 
task_200709221812_0001_m_000000).
+ 
+ With ''speculative-execution'' '''on''', one could face issues with 2 
instances of the same TIP (running simultaneously) trying to open/write-to the 
same file (path) on hdfs. Hence the app-writer will have to pick unique names 
(e.g. using the complete taskid i.e. task_200709221812_0001_m_000000_0) per 
task-attempt, not just per TIP. (Clearly, this needs to be done even if the 
user doesn't create/write-to files directly via reduce tasks.)
+ 
+ To get around this the framework helps the application-writer out by 
maintaining a special '''${mapred.output.dir}/_${taskid}''' sub-dir for each 
task-attempt on hdfs where the output of the reduce task-attempt goes. On 
successful completion of the task-attempt the files in the 
${mapred.output.dir}/_${taskid} (of the successful taskid only) are moved to 
${mapred.output.dir}. Of course, the framework discards the sub-directory of 
unsuccessful task-attempts. This is completely transparent to the application.
+ 
+ The application-writer can take advantage of this by creating any side-files 
required in ${mapred.output.dir} during execution of his reduce-task, and the 
framework will move them out similarly - thus you don't have to pick unique 
paths per task-attempt.
+ 
+ Fine-print: the value of ${mapred.output.dir} during execution of a 
particular task-attempt is actually ${mapred.output.dir}/_{$taskid}, not the 
value set by 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#setOutputPath(org.apache.hadoop.fs.Path)|JobConf.setOutputPath]].
 ''So, just create any hdfs files you want in ${mapred.output.dir} from your 
reduce task to take advantage of this feature.''
+ 
+ The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 
reduces) since output of the map, in that case, goes directly to hdfs.
+ 
+ == How do I get each of my maps to work on one complete input-file and not 
allow the framework to split-up my files? ==
+ 
+ Essentially a job's input is represented by the 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html|InputFormat]](interface)/[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html|FileInputFormat]](base
 class).
+ 
+ For this purpose one would need a 'non-splittable' 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html|FileInputFormat]]
 i.e. an input-format which essentially tells the map-reduce framework that it 
cannot be split-up and processed. To do this you need your particular 
input-format to return '''false''' for the 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html#isSplitable(org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path)|isSplittable]]
 call.
+ 
+ E.g. 
'''org.apache.hadoop.mapred.SortValidator.RecordStatsChecker.NonSplitableSequenceFileInputFormat'''
 in 
[[http://svn.apache.org/viewvc/hadoop/core/trunk/src/test/org/apache/hadoop/mapred/SortValidator.java|src/test/org/apache/hadoop/mapred/SortValidator.java]]
+ 
+ In addition to implementing the InputFormat interface and having 
isSplitable(...) returning false, it is also necessary to implement the 
RecordReader interface for returning the whole content of the input file. 
(default is LineRecordReader, which splits the file into separate lines)
+ 
+ The other, quick-fix option, is to set 
[[http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.min.split.size|mapred.min.split.size]]
 to large enough value.
+ 
+ == Why I do see broken images in jobdetails.jsp page? ==
+ 
+ In hadoop-0.15, Map / Reduce task completion graphics are added. The graphs 
are produced as SVG(Scalable Vector Graphics) images, which are basically xml 
files, embedded in html content. The graphics are tested successfully in 
Firefox 2 on Ubuntu and MAC OS. However for other browsers, one should install 
an additional plugin to the browser to see the SVG images. Adobe's SVG Viewer 
can be found at http://www.adobe.com/svg/viewer/install/.
+ 
+ == I see a maximum of 2 maps/reduces spawned concurrently on each 
TaskTracker, how do I increase that? ==
+ 
+ Use the configuration knob: 
[[http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.tasktracker.map.tasks.maximum|mapred.tasktracker.map.tasks.maximum]]
 and 
[[http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.tasktracker.reduce.tasks.maximum|mapred.tasktracker.reduce.tasks.maximum]]
 to control the number of maps/reduces spawned simultaneously on a 
!TaskTracker. By default, it is set to ''2'', hence one sees a maximum of 2 
maps and 2 reduces at a given instance on a !TaskTracker.
+ 
+ You can set those on a per-tasktracker basis to accurately reflect your 
hardware (i.e. set those to higher nos. on a beefier tasktracker etc.).
+ 
+ == Submitting map/reduce jobs as a different user doesn't work. ==
+ 
+ The problem is that you haven't configured your map/reduce system   directory 
to a fixed value. The default works for single node systems, but not for   
"real" clusters. I like to use:
+ 
+ {{{
+ <property>
+    <name>mapred.system.dir</name>
+    <value>/hadoop/mapred/system</value>
+    <description>The shared directory where MapReduce stores control files.
+    </description>
+ </property>
+ }}}
+ Note that this directory is in your default file system and must be   
accessible from both the client and server machines and is typically in HDFS.
+ 
+ == How do Map/Reduce InputSplit's handle record boundaries correctly? ==
+ 
+ It is the responsibility of the InputSplit's RecordReader to start and end at 
a record boundary. For SequenceFile's every 2k bytes has a 20 bytes '''sync''' 
mark between the records. These sync marks allow the RecordReader to seek to 
the start of the InputSplit, which contains a file, offset and length and find 
the first sync mark after the start of the split. The RecordReader continues 
processing records until it reaches the first sync mark after the end of the 
split. The first split of each file naturally starts immediately and not after 
the first sync mark. In this way, it is guaranteed that each record will be 
processed by exactly one mapper.
+ 
+ Text files are handled similarly, using newlines instead of sync marks.
+ 
+ == How do I change final output file name with the desired name rather than 
in partitions like part-00000, part-00001? ==
+ 
+ You can subclass the 
[[http://svn.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/OutputFormat.java?view=markup|OutputFormat.java]]
 class and write your own. You can look at the code of 
[[http://svn.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/TextOutputFormat.java?view=markup|TextOutputFormat]]
 
[[http://svn.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/MultipleOutputFormat.java?view=markup|MultipleOutputFormat.java]]
 etc. for reference. It might be the case that you only need to do minor 
changes to any of the existing Output Format classes. To do that you can just 
subclass that class and override the methods you need to change.
+ 
+ == When writing a New InputFormat, what is the format for the array of string 
returned by InputSplit\#getLocations()? ==
+ 
+ It appears that DatanodeID.getHost() is the standard place to retrieve this 
name, and the machineName variable, populated in DataNode.java\#startDataNode, 
is where the name is first set. The first method attempted is to get 
"slave.host.name" from the configuration; if that is not available, 
DNS.getDefaultHost is used instead.
+ 
+ == How do you gracefully stop a running job? ==
+ 
+ {{{
+ hadoop job -kill JOBID
+ }}}
+ 
+ = HDFS =
+ 
- <<BR>> <<Anchor(6)>> '''6. [[#A6|HDFS. If I add new data-nodes to the cluster 
will HDFS move the blocks to the newly added nodes in order to balance disk 
space utilization between the nodes?]]'''
+ == If I add new DataNodes to the cluster will HDFS move the blocks to the 
newly added nodes in order to balance disk space utilization between the nodes? 
==
  
  No, HDFS will not move blocks to new nodes automatically. However, newly 
created files will likely have their blocks placed on the new nodes.
  
@@ -76, +193 @@

    * 
[[http://developer.yahoo.com/hadoop/tutorial/module2.html#rebalancing|HDFS 
Tutorial: Rebalancing]];
    * 
[[http://hadoop.apache.org/core/docs/current/commands_manual.html#balancer|HDFS 
Commands Guide: balancer]].
  
- <<BR>> <<Anchor(7)>> '''7. [[#A7|HDFS. What is the purpose of the secondary 
name-node?]]'''
+ == What is the purpose of the secondary name-node? ==
  
  The term "secondary name-node" is somewhat misleading. It is not a name-node 
in the sense that data-nodes cannot connect to the secondary name-node, and in 
no event it can replace the primary name-node in case of its failure.
  
@@ -84, +201 @@

  
  So if the name-node fails and you can restart it on the same physical node 
then there is no need  to shutdown data-nodes, just the name-node need to be 
restarted. If you cannot use the old node anymore you will need to copy the 
latest image somewhere else. The latest image can be found either on the node 
that used to be the primary before failure if available; or on the secondary 
name-node. The latter will be the latest checkpoint without subsequent edits 
logs,  that is the most recent name space modifications may be missing there. 
You will also need to restart the whole cluster in this case.
  
- <<BR>> <<Anchor(8)>> '''8. [[#A8|MR. What is the Distributed Cache used 
for?]]'''
- 
- The distributed cache is used to distribute large read-only files that are 
needed by map/reduce jobs to the cluster. The framework will copy the necessary 
files from a url (either hdfs: or http:) on to the slave node before any tasks 
for the job are executed on that node. The files are only copied once per job 
and so should not be modified by the application.
- 
- <<BR>> <<Anchor(9)>> '''9. [[#A9|MR. Can I write create/write-to hdfs files 
directly from my map/reduce tasks?]]'''
- 
- Yes. (Clearly, you want this since you need to create/write-to files other 
than the output-file written out by 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/OutputCollector.html|OutputCollector]].)
- 
- Caveats:
- 
- <glossary>
- 
- ${mapred.output.dir} is the eventual output directory for the job 
([[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#setOutputPath(org.apache.hadoop.fs.Path)|JobConf.setOutputPath]]
 / 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#getOutputPath()|JobConf.getOutputPath]]).
- 
- ${taskid} is the actual id of the individual task-attempt (e.g. 
task_200709221812_0001_m_000000_0), a TIP is a bunch of ${taskid}s (e.g. 
task_200709221812_0001_m_000000).
- 
- </glossary>
- 
- With ''speculative-execution'' '''on''', one could face issues with 2 
instances of the same TIP (running simultaneously) trying to open/write-to the 
same file (path) on hdfs. Hence the app-writer will have to pick unique names 
(e.g. using the complete taskid i.e. task_200709221812_0001_m_000000_0) per 
task-attempt, not just per TIP. (Clearly, this needs to be done even if the 
user doesn't create/write-to files directly via reduce tasks.)
- 
- To get around this the framework helps the application-writer out by 
maintaining a special '''${mapred.output.dir}/_${taskid}''' sub-dir for each 
task-attempt on hdfs where the output of the reduce task-attempt goes. On 
successful completion of the task-attempt the files in the 
${mapred.output.dir}/_${taskid} (of the successful taskid only) are moved to 
${mapred.output.dir}. Of course, the framework discards the sub-directory of 
unsuccessful task-attempts. This is completely transparent to the application.
- 
- The application-writer can take advantage of this by creating any side-files 
required in ${mapred.output.dir} during execution of his reduce-task, and the 
framework will move them out similarly - thus you don't have to pick unique 
paths per task-attempt.
- 
- Fine-print: the value of ${mapred.output.dir} during execution of a 
particular task-attempt is actually ${mapred.output.dir}/_{$taskid}, not the 
value set by 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#setOutputPath(org.apache.hadoop.fs.Path)|JobConf.setOutputPath]].
 ''So, just create any hdfs files you want in ${mapred.output.dir} from your 
reduce task to take advantage of this feature.''
- 
- The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 
reduces) since output of the map, in that case, goes directly to hdfs.
- 
- <<BR>> <<Anchor(10)>> '''10. [[#A10|MR. How do I get each of my maps to work 
on one complete input-file and not allow the framework to split-up my 
files?]]'''
- 
- Essentially a job's input is represented by the 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html|InputFormat]](interface)/[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html|FileInputFormat]](base
 class).
- 
- For this purpose one would need a 'non-splittable' 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html|FileInputFormat]]
 i.e. an input-format which essentially tells the map-reduce framework that it 
cannot be split-up and processed. To do this you need your particular 
input-format to return '''false''' for the 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html#isSplitable(org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path)|isSplittable]]
 call.
- 
- E.g. 
'''org.apache.hadoop.mapred.SortValidator.RecordStatsChecker.NonSplitableSequenceFileInputFormat'''
 in 
[[http://svn.apache.org/viewvc/hadoop/core/trunk/src/test/org/apache/hadoop/mapred/SortValidator.java|src/test/org/apache/hadoop/mapred/SortValidator.java]]
- 
- In addition to implementing the InputFormat interface and having 
isSplitable(...) returning false, it is also necessary to implement the 
RecordReader interface for returning the whole content of the input file. 
(default is LineRecordReader, which splits the file into separate lines)
- 
- The other, quick-fix option, is to set 
[[http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.min.split.size|mapred.min.split.size]]
 to large enough value.
- 
- <<BR>> <<Anchor(11)>> '''11. [[#A11|Why I do see broken images in 
jobdetails.jsp page?]]'''
- 
- In hadoop-0.15, Map / Reduce task completion graphics are added. The graphs 
are produced as SVG(Scalable Vector Graphics) images, which are basically xml 
files, embedded in html content. The graphics are tested successfully in 
Firefox 2 on Ubuntu and MAC OS. However for other browsers, one should install 
an additional plugin to the browser to see the SVG images. Adobe's SVG Viewer 
can be found at http://www.adobe.com/svg/viewer/install/.
- 
- <<BR>> <<Anchor(12)>> '''12. [[#A12|HDFS. Does the name-node stay in safe 
mode till all under-replicated files are fully replicated?]]'''
+ == Does the name-node stay in safe mode till all under-replicated files are 
fully replicated? ==
  
  No. During safe mode replication of blocks is prohibited.  The name-node 
awaits when all or majority of data-nodes report their blocks.
  
@@ -138, +211 @@

  
  Learn more about safe mode 
[[http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Safemode|in 
the HDFS Users' Guide]].
  
+ == How do I set up a hadoop node to use multiple volumes? ==
- <<BR>> <<Anchor(13)>> '''13. [[#A13|MR. I see a maximum of 2 maps/reduces 
spawned concurrently on each TaskTracker, how do I increase that?]]'''
- 
- Use the configuration knob: 
[[http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.tasktracker.map.tasks.maximum|mapred.tasktracker.map.tasks.maximum]]
 and 
[[http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.tasktracker.reduce.tasks.maximum|mapred.tasktracker.reduce.tasks.maximum]]
 to control the number of maps/reduces spawned simultaneously on a 
!TaskTracker. By default, it is set to ''2'', hence one sees a maximum of 2 
maps and 2 reduces at a given instance on a !TaskTracker.
- 
- You can set those on a per-tasktracker basis to accurately reflect your 
hardware (i.e. set those to higher nos. on a beefier tasktracker etc.).
- 
- <<BR>> <<Anchor(14)>> '''14. [[#A14|MR. Submitting map/reduce jobs as a 
different user doesn't work.]]'''
- 
- The problem is that you haven't configured your map/reduce system   directory 
to a fixed value. The default works for single node systems, but not for   
"real" clusters. I like to use:
- 
- {{{
- <property>
-    <name>mapred.system.dir</name>
-    <value>/hadoop/mapred/system</value>
-    <description>The shared directory where MapReduce stores control files.
-    </description>
- </property>
- }}}
- Note that this directory is in your default file system and must be   
accessible from both the client and server machines and is typically   in HDFS.
- 
- <<BR>> <<Anchor(15)>> '''15. [[#A15|HDFS. How do I set up a hadoop node to 
use multiple volumes?]]'''
  
  ''Data-nodes'' can store blocks in multiple directories typically allocated 
on different local disk drives. In order to setup multiple directories one 
needs to specify a comma separated list of pathnames as a value of the 
configuration parameter  
[[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.data.dir|dfs.data.dir]].
 Data-nodes will attempt to place equal amount of data in each of the 
directories.
  
  The ''name-node'' also supports multiple directories, which in the case store 
the name space image and the edits log. The directories are specified via the  
[[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.name.dir|dfs.name.dir]]
 configuration parameter. The name-node directories are used for the name space 
data replication so that the image and the  log could be restored from the 
remaining volumes if one of them fails.
  
- <<BR>> <<Anchor(16)>> '''16. [[#A16|HDFS. What happens if one Hadoop client 
renames a file or a directory containing this file while another client is 
still writing into it?]]'''
+ == What happens if one Hadoop client renames a file or a directory containing 
this file while another client is still writing into it? ==
  
  Starting with release hadoop-0.15, a file will appear in the name space as 
soon as it is created.  If a writer is writing to a file and another client 
renames either the file itself or any of its path  components, then the 
original writer will get an IOException either when it finishes writing to the 
current  block or when it closes the file.
  
- <<BR>> <<Anchor(17)>> '''17. [[#A17|HDFS. I want to make a large cluster 
smaller by taking out a bunch of nodes simultaneously. How can this be 
done?]]'''
+ == I want to make a large cluster smaller by taking out a bunch of nodes 
simultaneously. How can this be done? ==
  
  On a large cluster removing one or two data-nodes will not lead to any data 
loss, because  name-node will replicate their blocks as long as it will detect 
that the nodes are dead. With a large number of nodes getting removed or dying 
the probability of losing data is higher.
  
@@ -183, +236 @@

  
  The decommission process can be terminated at any time by editing the 
configuration or the exclude files  and repeating the {{{-refreshNodes}}} 
command.
  
+ == Wildcard characters doesn't work correctly in FsShell. ==
- <<BR>> <<Anchor(18)>> '''18. [[#A18|What kind of hardware scales best for 
Hadoop?]]'''
- 
- The short answer is dual processor/dual core machines with 4-8GB of RAM using 
ECC memory. Machines should be moderately high-end commodity machines to be 
most cost-effective and typically cost 1/2 - 2/3 the cost of normal production 
application servers but are not desktop-class machines. This cost tends to be 
$2-5K. For a more detailed discussion, see MachineScaling page.
- 
- <<BR>> <<Anchor(19)>> '''19. [[#A19|Wildcard characters doesn't work 
correctly in FsShell.]]'''
  
  When you issue a command in !FsShell, you may want to apply that command to 
more than one file. !FsShell provides a wildcard character to help you do so.  
The * (asterisk) character can be used to take the place of any set of 
characters. For example, if you would like to list all the files in your 
account which begin with the letter '''x''', you could use the ls command with 
the * wildcard:
  
@@ -199, +248 @@

  {{{
  bin/hadoop dfs -ls 'in*'
  }}}
- <<BR>> <<Anchor(20)>> '''20. [[#A20|How does GridGain compare to Hadoop?]]'''
  
+ == Can I have multiple files in HDFS use different block sizes? ==
- !GridGain does not support data intensive jobs. For more details, see 
HadoopVsGridGain.
- 
- <<BR>> <<Anchor(21)>> '''21. [[#A21|HDFS. Can I have multiple files in HDFS 
use different block sizes?]]'''
  
  Yes. HDFS provides api to specify block size when you create a file. <<BR>> 
See 
[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long)|FileSystem.create(Path,
 overwrite, bufferSize, replication, blockSize, progress)]]
  
- <<BR>> <<Anchor(22)>> '''22. [[#A22|Does HDFS make block boundaries between 
records?]]'''
+ == Does HDFS make block boundaries between records? ==
  
  No, HDFS does not provide record-oriented API and therefore is not aware of 
records and boundaries between them.
  
+ == What happens when two clients try to write into the same HDFS file? ==
- <<BR>> <<Anchor(23)>> '''23. [[#A23|How do Map/Reduce InputSplit's handle 
record boundaries correctly?]]'''
- 
- It is the responsibility of the InputSplit's RecordReader to start and end at 
a record boundary. For SequenceFile's every 2k bytes has a 20 bytes '''sync''' 
mark between the records. These sync marks allow the RecordReader to seek to 
the start of the InputSplit, which contains a file, offset and length and find 
the first sync mark after the start of the split. The RecordReader continues 
processing records until it reaches the first sync mark after the end of the 
split. The first split of each file naturally starts immediately and not after 
the first sync mark. In this way, it is guaranteed that each record will be 
processed by exactly one mapper.
- 
- Text files are handled similarly, using newlines instead of sync marks.
- 
- <<BR>> <<Anchor(24)>> '''24. [[#A24|HDFS. What happens when two clients try 
to write into the same HDFS file?]]'''
  
  HDFS supports exclusive writes only. <<BR>> When the first client contacts 
the name-node to open the file for writing, the name-node grants a lease to the 
client to create this file.  When the second client tries to open the same file 
for writing, the name-node  will see that the lease for the file is already 
granted to another client, and will reject the open request for the second 
client.
  
- <<BR>> <<Anchor(25)>> '''25. [[#A25|I have a new node I want to add to a 
running Hadoop cluster; how do I start services on just one node?]]'''
- 
- This also applies to the case where a machine has crashed and rebooted, etc, 
and you need to get it to rejoin the cluster. You do not need to shutdown 
and/or restart the entire cluster in this case.
- 
- First, add the new node's DNS name to the conf/slaves file on the master node.
- 
- Then log in to the new slave node and execute:
- 
- {{{
- $ cd path/to/hadoop
- $ bin/hadoop-daemon.sh start datanode
- $ bin/hadoop-daemon.sh start tasktracker
- }}}
- <<BR>> <<Anchor(26)>> '''26. [[#A26|Is there an easy way to see the status 
and health of my cluster?]]'''
- 
- There are web-based interfaces to both the JobTracker (MapReduce master) and 
NameNode (HDFS master) which display status pages about the state of the entire 
system. By default, these are located at http://job.tracker.addr:50030/ and 
http://name.node.addr:50070/.
- 
- The JobTracker status page will display the state of all nodes, as well as 
the job queue and status about all currently running jobs and tasks. The 
NameNode status page will display the state of all nodes and the amount of free 
space, and provides the ability to browse the DFS via the web.
- 
- You can also see some basic HDFS cluster health data by running:
- 
- {{{
- $ bin/hadoop dfsadmin -report
- }}}
- <<BR>> <<Anchor(27)>> '''27. [[#A27|How do I change final output file name 
with the desired name rather than in partitions like part-00000, part-00001 
?]]'''
- 
- You can subclass the 
[[http://svn.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/OutputFormat.java?view=markup|OutputFormat.java]]
 class and write your own. You can look at the code of 
[[http://svn.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/TextOutputFormat.java?view=markup|TextOutputFormat]]
 
[[http://svn.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/MultipleOutputFormat.java?view=markup|MultipleOutputFormat.java]]
 etc. for reference. It might be the case that you only need to do minor 
changes to any of the existing Output Format classes. To do that you can just 
subclass that class and override the methods you need to change.
- 
- <<BR>> <<Anchor(28)>> '''28. [[#A28|How much network bandwidth might I need 
between racks in a medium size (40-80 node) Hadoop cluster?]]'''
- 
- The true answer depends on the types of jobs you're running. As a back of the 
envelope calculation one might figure something like this:
- 
- 60 nodes total on 2 racks = 30 nodes per rack Each node might process about 
100MB/sec of data In the case of a sort job where the intermediate data is the 
same size as the input data, that means each node needs to shuffle 100MB/sec of 
data In aggregate, each rack is then producing about 3GB/sec of data However, 
given even reducer spread across the racks, each rack will need to send 
1.5GB/sec to reducers running on the other rack. Since the connection is full 
duplex, that means you need 1.5GB/sec of bisection bandwidth for this 
theoretical job. So that's 12Gbps.
- 
- However, the above calculations are probably somewhat of an upper bound. A 
large number of jobs have significant data reduction during the map phase, 
either by some kind of filtering/selection going on in the Mapper itself, or by 
good usage of Combiners. Additionally, intermediate data compression can cut 
the intermediate data transfer by a significant factor. Lastly, although your 
disks can probably provide 100MB sustained throughput, it's rare to see a MR 
job which can sustain disk speed IO through the entire pipeline. So, I'd say my 
estimate is at least a factor of 2 too high.
- 
- So, the simple answer is that 4-6Gbps is most likely just fine for most 
practical jobs. If you want to be extra safe, many inexpensive switches can 
operate in a "stacked" configuration where the bandwidth between them is 
essentially backplane speed. That should scale you to 96 nodes with plenty of 
headroom. Many inexpensive gigabit switches also have one or two 10GigE ports 
which can be used effectively to connect to each other or to a 10GE core.
- 
- '''29.[[#A29|How to limit Data node's disk usage?]]'''
+ == How to limit Data node's disk usage? ==
  
  Use dfs.datanode.du.reserved configuration value in 
$HADOOP_HOME/conf/hdfs-site.xml for limiting disk usage.
  
@@ -272, +274 @@

      </description>
    </property>
  }}}
- <<Anchor(30)>> ''30. [[#A30|When writing a New InputFormat, what is the 
format for the array of string returned by InputSplit\#getLocations()?]]''
  
- It appears that DatanodeID.getHost() is the standard place to retrieve this 
name, and the machineName variable, populated in DataNode.java\#startDataNode, 
is where the name is first set. The first method attempted is to get 
"slave.host.name" from the configuration; if that is not available, 
DNS.getDefaultHost is used instead.
- 
- <<BR>> <<Anchor(31)>> '''31. [[#A31|On an individual data node, how do you 
balance the blocks on the disk?]]'''
+ == On an individual data node, how do you balance the blocks on the disk? ==
  
  Hadoop currently does not have a method by which to do this automatically.  
To do this manually:
  
@@ -284, +283 @@

   2. Use the UNIX mv command to move the individual blocks and meta pairs from 
one directory to another on each host
   3. Restart the HDFS
  
- <<BR>> <<Anchor(32)>> '''32. [[#A32|How do you gracefully stop a running 
job?]]'''
- 
- hadoop job -kill JOBID
- 
- <<BR>> <<Anchor(33)>> '''33. [[#A33|What does "file could only be replicated 
to 0 nodes, instead of 1" mean?]]'''
+ == What does "file could only be replicated to 0 nodes, instead of 1" mean? ==
  
  The NameNode does not have any available DataNodes.  This can be caused by a 
wide variety of reasons.  Check the DataNode logs, the NameNode logs, network 
connectivity, ...
  
+ = Platform Specific =
+ == Windows ==
+ 
+ === Building / Testing Hadoop on Windows ===
+ 
+ The Hadoop build on Windows can be run from inside a Windows (not cygwin) 
command prompt window.
+ 
+ Whether you set environment variables in a batch file or in 
System->Properties->Advanced->Environment Variables, the following environment 
variables need to be set:
+ 
+ {{{
+ set ANT_HOME=c:\apache-ant-1.7.1
+ set JAVA_HOME=c:\jdk1.6.0.4
+ set PATH=%PATH%;%ANT_HOME%\bin
+ }}}
+ then open a command prompt window, cd to your workspace directory (in my case 
it is c:\workspace\hadoop) and run ant. Since I am interested in running the 
contrib test cases I do the following:
+ 
+ {{{
+ ant -l build.log -Dtest.output=yes test-contrib
+ }}}
+ other targets work similarly. I just wanted to document this because I spent 
some time trying to figure out why the ant build would not run from a cygwin 
command prompt window. If you are building/testing on Windows, and haven't 
figured it out yet, this should get you started.
+

[Hadoop Wiki] Update of "FAQ" by SomeOtherAccount

Reply via email to