Re: Reduce hangs

Yunhong Gu1 Fri, 18 Jan 2008 16:09:59 -0800

I am using 0.15.2, and in my case, CPUs on both nodes are idle. It lookslike the program is trapped into a synchronization deadlock or somewaiting state that will never be awaken.


Yunhong

On Fri, 18 Jan 2008, Jason Venner wrote:

When this was happening to us, there was a block replication error and onenode was in an endless loop trying to replicate a block to another node whichwould not accept it. In our case most of the cluster was idle but a cpu onthe machine trying send the block was heavily used.
We never were able to isolate the cause, and it stopped happening for us whenwe upgraded to 0.15.1
---
Attributor is hiring Hadoop Wranglers, contact if interested.

Yunhong Gu1 wrote:
Hi,
If someone knows how to fix the problem described below, please help meout. Thanks!
I am testing Hadoop on 2-node cluster and the "reduce" always hangs at somestage, even if I use different clusters. My OS is Debian Linux kernel 2.6(AMD Opteron w/ 4GB Mem). Hadoop verision is 0.15.2. Java version is1.5.0_01-b08.
I simply tried "./bin/hadoop jar hadoop-0.15.2-test.jar mrbench" and whenthe map stage finishes, the reduce stage will hang somewhere in the middle,sometimes at 0%. I also tried any other mapreduce program I can find in theexample jar package but they all hang.
The log file simply print
2008-01-18 15:15:50,831 INFO org.apache.hadoop.mapred.TaskTracker:task_200801181424_0004_r_000000_0 0.0% reduce > copy >2008-01-18 15:15:56,841 INFO org.apache.hadoop.mapred.TaskTracker:task_200801181424_0004_r_000000_0 0.0% reduce > copy >2008-01-18 15:16:02,850 INFO org.apache.hadoop.mapred.TaskTracker:task_200801181424_0004_r_000000_0 0.0% reduce > copy >
forever.

The program does work if I start Hadoop only on single node.

Below is my hadoop-site.xml configuration:

<configuration>

<property>
   <name>fs.default.name</name>
   <value>10.0.0.1:60000</value>
</property>

<property>
   <name>mapred.job.tracker</name>
   <value>10.0.0.1:60001</value>
</property>

<property>
   <name>dfs.data.dir</name>
   <value>/raid/hadoop/data</value>
</property>

<property>
   <name>mapred.local.dir</name>
   <value>/raid/hadoop/mapred</value>
</property>

<property>
  <name>hadoop.tmp.dir</name>
  <value>/raid/hadoop/tmp</value>
</property>

<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx1024m</value>
</property>

<property>
  <name>mapred.tasktracker.tasks.maximum</name>
  <value>4</value>
</property>



<property>
  <name>fs.inmemory.size.mb</name>
  <value>200</value>
</property>

<property>
  <name>dfs.block.size</name>
  <value>134217728</value>
</property>

<property>
  <name>io.sort.factor</name>
  <value>100</value>
</property>

<property>
  <name>io.sort.mb</name>
  <value>200</value>
</property>

<property>
  <name>io.file.buffer.size</name>
  <value>131072</value>
</property>

</configuration>

Re: Reduce hangs

Reply via email to