chu,

Which version of HBase is this? How much data are you writing? What's
your cluster like (#nodes, hardware, etc)?

It would also be useful to see your master log around where it began
to fail, you should see exceptions showing up.

Thx,

J-D

On Sat, Aug 8, 2009 at 11:25 PM, stchu<[email protected]> wrote:
> Hi,
>
> I tried to write the large amount of data into HBase. The map processes
> could be completed, but while processing the reduce tasks, the job is
> failed.
> The error message is shown as follow:
> ------------------------------------------------------------------------
> ...
> 09/08/07 19:45:47 INFO mapred.JobClient:  map 100% reduce 5%
> 09/08/07 19:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> 09/08/07 19:46:27 INFO mapred.JobClient:  map 100% reduce 5%
> java.io.IOException: Job failed!
>    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>    at icl.atc.ites.hbase.PIndexCreator.run(PIndexCreator.java:571)
>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>    at icl.atc.ites.hbase.PIndexCreator.main(PIndexCreator.java:598)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
> And the error message on web ui:
>
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:786)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:465)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:474)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:478)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440)
>        at org.apache.hadoop.hbase.client.HTable.(HTable.java:115)
>        at org.apache.hadoop.hbase.client.HTable.(HTable.java:98)
>        at 
> org.apache.hadoop.hbase.mapred.TableOutputFormat.getRecordWriter(TableOutputFormat.java:90)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404)
>        at org.apache.hadoop.mapred.Child.main(Child.java:158)
>
>
> -------------------------------------------------------------------
> And I checked both the userlog and the tasktracker log in one of the machine
> (these two files are located at the machine #4 in my cluster):
> Userlog:
> -------------------------------------------------------------------------------------------------
> ...
>
> 2009-08-06 18:23:25,712 INFO org.apache.hadoop.mapred.Merger: Down to
> the last merge-pass, with 16 segments left of total size: 163329244
> bytes
> 2009-08-06 18:23:26,867 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> and0 dup hosts)
> 2009-08-06 18:23:27,531 INFO org.apache.hadoop.mapred.ReduceTask:
> Shuffling 15322761 bytes (15322765 raw bytes) into RAM from
> attempt_200908051649_0004_m_000155_0
> 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200908051649_0004_r_000000_1 Merge of the 16 files in-memory
> complete. Local file is
> /home/chu/hadoop19/mapred/local/taskTracker/jobcache/job_200908051649_0004/attempt_200908051649_0004_r_000000_1/output/map_136.out
> of size 163329218
> 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200908051649_0004_r_000000_1 Thread waiting: Thread for
> merging on-disk files
> 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 15322761 bytes from map-output for
> attempt_200908051649_0004_m_000155_0
> 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> #1 from attempt_200908051649_0004_m_000155_0 -> (32, 26) from
> ITeS-clusterPC2
> 2009-08-06 18:23:33,867 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> and0 dup hosts)
> 2009-08-06 18:23:34,232 INFO org.apache.hadoop.mapred.ReduceTask:
> Shuffling 14110457 bytes (14110461 raw bytes) into RAM from
> attempt_200908051649_0004_m_000153_0
> 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 14110457 bytes from map-output for
> attempt_200908051649_0004_m_000153_0
> 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> #1 from attempt_200908051649_0004_m_000153_0 -> (32, 26) from
> ITeS-clusterPC2
> 2009-08-06 18:23:43,866 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> and0 dup hosts)
> 2009-08-06 18:23:44,050 INFO org.apache.hadoop.mapred.ReduceTask:
> Shuffling 13510680 bytes (13510684 raw bytes) into RAM from
> attempt_200908051649_0004_m_000151_0
> 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 13510680 bytes from map-output for
> attempt_200908051649_0004_m_000151_0
> 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> #1 from attempt_200908051649_0004_m_000151_0 -> (32, 26) from
> ITeS-clusterPC2
> 2009-08-06 18:23:49,867 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> and0 dup hosts)
> 2009-08-06 18:23:50,194 INFO org.apache.hadoop.mapred.ReduceTask:
> Shuffling 14813689 bytes (14813693 raw bytes) into RAM from
> attempt_200908051649_0004_m_000154_0
> 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 14813689 bytes from map-output for
> attempt_200908051649_0004_m_000154_0
> 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> #1 from attempt_200908051649_0004_m_000154_0 -> (32, 26) from
> ITeS-clusterPC2
> 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
> Closed ram manager
> 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
> Interleaved on-disk merge complete: 13 files left.
> 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
> In-memory merge complete: 5 files left.
> 2009-08-06 18:23:57,869 INFO org.apache.hadoop.mapred.ReduceTask:
> Keeping 5 segments, 73734381 bytes in memory for intermediate, on-disk
> merge
> 2009-08-06 18:23:57,872 INFO org.apache.hadoop.mapred.ReduceTask:
> Merging 13 files, 2343086626 bytes from disk
> 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.ReduceTask:
> Merging 0 segments, 0 bytes from memory into reduce
> 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.Merger: Merging
> 18 sorted segments
> 2009-08-06 18:23:59,962 INFO org.apache.hadoop.mapred.Merger: Merging
> 9 intermediate segments out of a total of 18
> 2009-08-06 18:27:44,177 INFO org.apache.hadoop.mapred.Merger: Down to
> the last merge-pass, with 10 segments left of total size: 2343086558
> bytes
> 2009-08-06 18:29:04,667 ERROR
> org.apache.hadoop.hbase.mapred.TableOutputFormat:
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
> 2009-08-06 18:29:04,699 WARN org.apache.hadoop.mapred.TaskTracker:
> Error running child
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:786)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:465)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:474)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:478)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440)
>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:115)
>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:98)
>        at 
> org.apache.hadoop.hbase.mapred.TableOutputFormat.getRecordWriter(TableOutputFormat.java:90)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404)
>        at org.apache.hadoop.mapred.Child.main(Child.java:158)
>
> 2009-08-06 18:29:04,716 INFO org.apache.hadoop.mapred.TaskRunner:
> Runnning cleanup for the task
>
>
> ------------------------------------------------------------------------------------------------------------
>
>  And tasklog:
>
> 2009-08-06 18:23:25,712 INFO org.apache.hadoop.mapred.Merger: Down to
> the last merge-pass, with 16 segments left of total size: 163329244
> bytes
> 2009-08-06 18:23:26,867 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> and0 dup hosts)
> 2009-08-06 18:23:27,531 INFO org.apache.hadoop.mapred.ReduceTask:
> Shuffling 15322761 bytes (15322765 raw bytes) into RAM from
> attempt_200908051649_0004_m_000155_0
> 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200908051649_0004_r_000000_1 Merge of the 16 files in-memory
> complete. Local file is
> /home/chu/hadoop19/mapred/local/taskTracker/jobcache/job_200908051649_0004/attempt_200908051649_0004_r_000000_1/output/map_136.out
> of size 163329218
> 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200908051649_0004_r_000000_1 Thread waiting: Thread for
> merging on-disk files
> 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 15322761 bytes from map-output for
> attempt_200908051649_0004_m_000155_0
> 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> #1 from attempt_200908051649_0004_m_000155_0 -> (32, 26) from
> ITeS-clusterPC2
> 2009-08-06 18:23:33,867 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> and0 dup hosts)
> 2009-08-06 18:23:34,232 INFO org.apache.hadoop.mapred.ReduceTask:
> Shuffling 14110457 bytes (14110461 raw bytes) into RAM from
> attempt_200908051649_0004_m_000153_0
> 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 14110457 bytes from map-output for
> attempt_200908051649_0004_m_000153_0
> 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> #1 from attempt_200908051649_0004_m_000153_0 -> (32, 26) from
> ITeS-clusterPC2
> 2009-08-06 18:23:43,866 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> and0 dup hosts)
> 2009-08-06 18:23:44,050 INFO org.apache.hadoop.mapred.ReduceTask:
> Shuffling 13510680 bytes (13510684 raw bytes) into RAM from
> attempt_200908051649_0004_m_000151_0
> 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 13510680 bytes from map-output for
> attempt_200908051649_0004_m_000151_0
> 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> #1 from attempt_200908051649_0004_m_000151_0 -> (32, 26) from
> ITeS-clusterPC2
> 2009-08-06 18:23:49,867 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> and0 dup hosts)
> 2009-08-06 18:23:50,194 INFO org.apache.hadoop.mapred.ReduceTask:
> Shuffling 14813689 bytes (14813693 raw bytes) into RAM from
> attempt_200908051649_0004_m_000154_0
> 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Read
> 14813689 bytes from map-output for
> attempt_200908051649_0004_m_000154_0
> 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> #1 from attempt_200908051649_0004_m_000154_0 -> (32, 26) from
> ITeS-clusterPC2
> 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
> Closed ram manager
> 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
> Interleaved on-disk merge complete: 13 files left.
> 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
> In-memory merge complete: 5 files left.
> 2009-08-06 18:23:57,869 INFO org.apache.hadoop.mapred.ReduceTask:
> Keeping 5 segments, 73734381 bytes in memory for intermediate, on-disk
> merge
> 2009-08-06 18:23:57,872 INFO org.apache.hadoop.mapred.ReduceTask:
> Merging 13 files, 2343086626 bytes from disk
> 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.ReduceTask:
> Merging 0 segments, 0 bytes from memory into reduce
> 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.Merger: Merging
> 18 sorted segments
> 2009-08-06 18:23:59,962 INFO org.apache.hadoop.mapred.Merger: Merging
> 9 intermediate segments out of a total of 18
> 2009-08-06 18:27:44,177 INFO org.apache.hadoop.mapred.Merger: Down to
> the last merge-pass, with 10 segments left of total size: 2343086558
> bytes
> 2009-08-06 18:29:04,667 ERROR
> org.apache.hadoop.hbase.mapred.TableOutputFormat:
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
> The error seems occured in the sort stage of the reduce task. It never
> occurs if I write the small data into HBase.
> Can anyone give me a help? Thank you very much!
>
>
> chu
>

Reply via email to