chu, Which version of HBase is this? How much data are you writing? What's your cluster like (#nodes, hardware, etc)?
It would also be useful to see your master log around where it began to fail, you should see exceptions showing up. Thx, J-D On Sat, Aug 8, 2009 at 11:25 PM, stchu<[email protected]> wrote: > Hi, > > I tried to write the large amount of data into HBase. The map processes > could be completed, but while processing the reduce tasks, the job is > failed. > The error message is shown as follow: > ------------------------------------------------------------------------ > ... > 09/08/07 19:45:47 INFO mapred.JobClient: map 100% reduce 5% > 09/08/07 19:46:21 INFO mapred.JobClient: map 100% reduce 6% > 09/08/07 19:46:27 INFO mapred.JobClient: map 100% reduce 5% > java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) > at icl.atc.ites.hbase.PIndexCreator.run(PIndexCreator.java:571) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at icl.atc.ites.hbase.PIndexCreator.main(PIndexCreator.java:598) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:165) > at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) > > And the error message on web ui: > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying to locate root region > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:786) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:465) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:474) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:478) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:115) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:98) > at > org.apache.hadoop.hbase.mapred.TableOutputFormat.getRecordWriter(TableOutputFormat.java:90) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404) > at org.apache.hadoop.mapred.Child.main(Child.java:158) > > > ------------------------------------------------------------------- > And I checked both the userlog and the tasktracker log in one of the machine > (these two files are located at the machine #4 in my cluster): > Userlog: > ------------------------------------------------------------------------------------------------- > ... > > 2009-08-06 18:23:25,712 INFO org.apache.hadoop.mapred.Merger: Down to > the last merge-pass, with 16 segments left of total size: 163329244 > bytes > 2009-08-06 18:23:26,867 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts > and0 dup hosts) > 2009-08-06 18:23:27,531 INFO org.apache.hadoop.mapred.ReduceTask: > Shuffling 15322761 bytes (15322765 raw bytes) into RAM from > attempt_200908051649_0004_m_000155_0 > 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_200908051649_0004_r_000000_1 Merge of the 16 files in-memory > complete. Local file is > /home/chu/hadoop19/mapred/local/taskTracker/jobcache/job_200908051649_0004/attempt_200908051649_0004_r_000000_1/output/map_136.out > of size 163329218 > 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_200908051649_0004_r_000000_1 Thread waiting: Thread for > merging on-disk files > 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Read > 15322761 bytes from map-output for > attempt_200908051649_0004_m_000155_0 > 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Rec > #1 from attempt_200908051649_0004_m_000155_0 -> (32, 26) from > ITeS-clusterPC2 > 2009-08-06 18:23:33,867 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts > and0 dup hosts) > 2009-08-06 18:23:34,232 INFO org.apache.hadoop.mapred.ReduceTask: > Shuffling 14110457 bytes (14110461 raw bytes) into RAM from > attempt_200908051649_0004_m_000153_0 > 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Read > 14110457 bytes from map-output for > attempt_200908051649_0004_m_000153_0 > 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Rec > #1 from attempt_200908051649_0004_m_000153_0 -> (32, 26) from > ITeS-clusterPC2 > 2009-08-06 18:23:43,866 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts > and0 dup hosts) > 2009-08-06 18:23:44,050 INFO org.apache.hadoop.mapred.ReduceTask: > Shuffling 13510680 bytes (13510684 raw bytes) into RAM from > attempt_200908051649_0004_m_000151_0 > 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Read > 13510680 bytes from map-output for > attempt_200908051649_0004_m_000151_0 > 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Rec > #1 from attempt_200908051649_0004_m_000151_0 -> (32, 26) from > ITeS-clusterPC2 > 2009-08-06 18:23:49,867 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts > and0 dup hosts) > 2009-08-06 18:23:50,194 INFO org.apache.hadoop.mapred.ReduceTask: > Shuffling 14813689 bytes (14813693 raw bytes) into RAM from > attempt_200908051649_0004_m_000154_0 > 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Read > 14813689 bytes from map-output for > attempt_200908051649_0004_m_000154_0 > 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Rec > #1 from attempt_200908051649_0004_m_000154_0 -> (32, 26) from > ITeS-clusterPC2 > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask: > Closed ram manager > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask: > Interleaved on-disk merge complete: 13 files left. > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask: > In-memory merge complete: 5 files left. > 2009-08-06 18:23:57,869 INFO org.apache.hadoop.mapred.ReduceTask: > Keeping 5 segments, 73734381 bytes in memory for intermediate, on-disk > merge > 2009-08-06 18:23:57,872 INFO org.apache.hadoop.mapred.ReduceTask: > Merging 13 files, 2343086626 bytes from disk > 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.ReduceTask: > Merging 0 segments, 0 bytes from memory into reduce > 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.Merger: Merging > 18 sorted segments > 2009-08-06 18:23:59,962 INFO org.apache.hadoop.mapred.Merger: Merging > 9 intermediate segments out of a total of 18 > 2009-08-06 18:27:44,177 INFO org.apache.hadoop.mapred.Merger: Down to > the last merge-pass, with 10 segments left of total size: 2343086558 > bytes > 2009-08-06 18:29:04,667 ERROR > org.apache.hadoop.hbase.mapred.TableOutputFormat: > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying to locate root region > 2009-08-06 18:29:04,699 WARN org.apache.hadoop.mapred.TaskTracker: > Error running child > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying to locate root region > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:786) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:465) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:474) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:478) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:115) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:98) > at > org.apache.hadoop.hbase.mapred.TableOutputFormat.getRecordWriter(TableOutputFormat.java:90) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404) > at org.apache.hadoop.mapred.Child.main(Child.java:158) > > 2009-08-06 18:29:04,716 INFO org.apache.hadoop.mapred.TaskRunner: > Runnning cleanup for the task > > > ------------------------------------------------------------------------------------------------------------ > > And tasklog: > > 2009-08-06 18:23:25,712 INFO org.apache.hadoop.mapred.Merger: Down to > the last merge-pass, with 16 segments left of total size: 163329244 > bytes > 2009-08-06 18:23:26,867 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts > and0 dup hosts) > 2009-08-06 18:23:27,531 INFO org.apache.hadoop.mapred.ReduceTask: > Shuffling 15322761 bytes (15322765 raw bytes) into RAM from > attempt_200908051649_0004_m_000155_0 > 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_200908051649_0004_r_000000_1 Merge of the 16 files in-memory > complete. Local file is > /home/chu/hadoop19/mapred/local/taskTracker/jobcache/job_200908051649_0004/attempt_200908051649_0004_r_000000_1/output/map_136.out > of size 163329218 > 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_200908051649_0004_r_000000_1 Thread waiting: Thread for > merging on-disk files > 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Read > 15322761 bytes from map-output for > attempt_200908051649_0004_m_000155_0 > 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Rec > #1 from attempt_200908051649_0004_m_000155_0 -> (32, 26) from > ITeS-clusterPC2 > 2009-08-06 18:23:33,867 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts > and0 dup hosts) > 2009-08-06 18:23:34,232 INFO org.apache.hadoop.mapred.ReduceTask: > Shuffling 14110457 bytes (14110461 raw bytes) into RAM from > attempt_200908051649_0004_m_000153_0 > 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Read > 14110457 bytes from map-output for > attempt_200908051649_0004_m_000153_0 > 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Rec > #1 from attempt_200908051649_0004_m_000153_0 -> (32, 26) from > ITeS-clusterPC2 > 2009-08-06 18:23:43,866 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts > and0 dup hosts) > 2009-08-06 18:23:44,050 INFO org.apache.hadoop.mapred.ReduceTask: > Shuffling 13510680 bytes (13510684 raw bytes) into RAM from > attempt_200908051649_0004_m_000151_0 > 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Read > 13510680 bytes from map-output for > attempt_200908051649_0004_m_000151_0 > 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Rec > #1 from attempt_200908051649_0004_m_000151_0 -> (32, 26) from > ITeS-clusterPC2 > 2009-08-06 18:23:49,867 INFO org.apache.hadoop.mapred.ReduceTask: > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts > and0 dup hosts) > 2009-08-06 18:23:50,194 INFO org.apache.hadoop.mapred.ReduceTask: > Shuffling 14813689 bytes (14813693 raw bytes) into RAM from > attempt_200908051649_0004_m_000154_0 > 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Read > 14813689 bytes from map-output for > attempt_200908051649_0004_m_000154_0 > 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Rec > #1 from attempt_200908051649_0004_m_000154_0 -> (32, 26) from > ITeS-clusterPC2 > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask: > Closed ram manager > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask: > Interleaved on-disk merge complete: 13 files left. > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask: > In-memory merge complete: 5 files left. > 2009-08-06 18:23:57,869 INFO org.apache.hadoop.mapred.ReduceTask: > Keeping 5 segments, 73734381 bytes in memory for intermediate, on-disk > merge > 2009-08-06 18:23:57,872 INFO org.apache.hadoop.mapred.ReduceTask: > Merging 13 files, 2343086626 bytes from disk > 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.ReduceTask: > Merging 0 segments, 0 bytes from memory into reduce > 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.Merger: Merging > 18 sorted segments > 2009-08-06 18:23:59,962 INFO org.apache.hadoop.mapred.Merger: Merging > 9 intermediate segments out of a total of 18 > 2009-08-06 18:27:44,177 INFO org.apache.hadoop.mapred.Merger: Down to > the last merge-pass, with 10 segments left of total size: 2343086558 > bytes > 2009-08-06 18:29:04,667 ERROR > org.apache.hadoop.hbase.mapred.TableOutputFormat: > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying to locate root region > > ----------------------------------------------------------------------------------------------------------------------------------------------- > > The error seems occured in the sort stage of the reduce task. It never > occurs if I write the small data into HBase. > Can anyone give me a help? Thank you very much! > > > chu >
