Those NoServerForRegionException are probably putting a stake through throughput especially when they are complaining root is unobtainable. Lets try and figure whats up here (Jon Gray has a good suggestion in this regard).
On schema, how many columns do you think you'll have per family? The number of columns story has improved by a bunch in hbase 0.20.0. Should be able to do thousands if not more (per column family). St.Ack On Fri, Jul 3, 2009 at 6:00 AM, Irfan Mohammed <irfan...@gmail.com> wrote: > Thanks for the quick responses. > > I removed the reduce pass and doing the inserts in the map pass. Reduced > the number of Map instances to 10. It is still taking about 12 minutes to > complete the inserts. > > Any reason why there should be arbitrary NoServerForRegionException? > > I am working on writing to hdfs and checking the performance. > > 09/07/03 08:38:35 INFO mapred.JobClient: Running job: > job_200906192236_24166 > 09/07/03 08:38:36 INFO mapred.JobClient: map 0% reduce 0% > 09/07/03 08:38:53 INFO mapred.JobClient: map 1% reduce 0% > 09/07/03 08:38:59 INFO mapred.JobClient: map 2% reduce 0% > 09/07/03 08:39:02 INFO mapred.JobClient: map 3% reduce 0% > 09/07/03 08:39:08 INFO mapred.JobClient: map 4% reduce 0% > 09/07/03 08:39:14 INFO mapred.JobClient: map 5% reduce 0% > 09/07/03 08:39:20 INFO mapred.JobClient: map 6% reduce 0% > 09/07/03 08:39:26 INFO mapred.JobClient: map 7% reduce 0% > 09/07/03 08:39:35 INFO mapred.JobClient: map 8% reduce 0% > 09/07/03 08:39:41 INFO mapred.JobClient: map 9% reduce 0% > 09/07/03 08:39:50 INFO mapred.JobClient: map 10% reduce 0% > 09/07/03 08:39:56 INFO mapred.JobClient: map 11% reduce 0% > 09/07/03 08:40:05 INFO mapred.JobClient: map 12% reduce 0% > 09/07/03 08:40:14 INFO mapred.JobClient: map 13% reduce 0% > 09/07/03 08:40:20 INFO mapred.JobClient: map 14% reduce 0% > 09/07/03 08:40:26 INFO mapred.JobClient: map 15% reduce 0% > 09/07/03 08:40:32 INFO mapred.JobClient: map 16% reduce 0% > 09/07/03 08:40:38 INFO mapred.JobClient: map 17% reduce 0% > 09/07/03 08:40:44 INFO mapred.JobClient: map 18% reduce 0% > 09/07/03 08:40:46 INFO mapred.JobClient: Task Id : > attempt_200906192236_24166_m_000007_0, Status : FAILED > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying > to locate root region > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107) > at > com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449) > at > org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > attempt_200906192236_24166_m_000007_0: [2009-07-03 08:40:42.553] failed to > initialize the hbase configuration > 09/07/03 08:40:46 INFO mapred.JobClient: Task Id : > attempt_200906192236_24166_m_000009_0, Status : FAILED > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying > to locate root region > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107) > at > com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449) > at > org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > attempt_200906192236_24166_m_000009_0: [2009-07-03 08:40:40.061] failed to > initialize the hbase configuration > 09/07/03 08:40:47 INFO mapred.JobClient: map 19% reduce 0% > 09/07/03 08:40:49 INFO mapred.JobClient: Task Id : > attempt_200906192236_24166_m_000008_0, Status : FAILED > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying > to locate root region > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107) > at > com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449) > at > org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > attempt_200906192236_24166_m_000008_0: [2009-07-03 08:40:44.631] failed to > initialize the hbase configuration > 09/07/03 08:40:53 INFO mapred.JobClient: map 20% reduce 0% > 09/07/03 08:40:56 INFO mapred.JobClient: map 21% reduce 0% > 09/07/03 08:41:02 INFO mapred.JobClient: map 22% reduce 0% > 09/07/03 08:41:08 INFO mapred.JobClient: map 23% reduce 0% > 09/07/03 08:41:17 INFO mapred.JobClient: map 24% reduce 0% > 09/07/03 08:41:26 INFO mapred.JobClient: map 25% reduce 0% > 09/07/03 08:41:32 INFO mapred.JobClient: map 26% reduce 0% > 09/07/03 08:41:38 INFO mapred.JobClient: map 27% reduce 0% > 09/07/03 08:41:44 INFO mapred.JobClient: map 28% reduce 0% > 09/07/03 08:41:50 INFO mapred.JobClient: map 29% reduce 0% > 09/07/03 08:41:53 INFO mapred.JobClient: map 30% reduce 0% > 09/07/03 08:42:02 INFO mapred.JobClient: map 31% reduce 0% > 09/07/03 08:42:08 INFO mapred.JobClient: map 32% reduce 0% > 09/07/03 08:42:11 INFO mapred.JobClient: map 33% reduce 0% > 09/07/03 08:42:17 INFO mapred.JobClient: map 34% reduce 0% > 09/07/03 08:42:20 INFO mapred.JobClient: map 35% reduce 0% > 09/07/03 08:42:26 INFO mapred.JobClient: map 36% reduce 0% > 09/07/03 08:42:32 INFO mapred.JobClient: map 37% reduce 0% > 09/07/03 08:42:38 INFO mapred.JobClient: map 38% reduce 0% > 09/07/03 08:42:44 INFO mapred.JobClient: map 39% reduce 0% > 09/07/03 08:42:53 INFO mapred.JobClient: map 40% reduce 0% > 09/07/03 08:42:55 INFO mapred.JobClient: Task Id : > attempt_200906192236_24166_m_000009_1, Status : FAILED > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying > to locate root region > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107) > at > com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449) > at > org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > attempt_200906192236_24166_m_000009_1: [2009-07-03 08:42:50.373] failed to > initialize the hbase configuration > 09/07/03 08:42:55 INFO mapred.JobClient: Task Id : > attempt_200906192236_24166_m_000007_1, Status : FAILED > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying > to locate root region > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107) > at > com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449) > at > org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > attempt_200906192236_24166_m_000007_1: [2009-07-03 08:42:49.181] failed to > initialize the hbase configuration > 09/07/03 08:42:55 INFO mapred.JobClient: Task Id : > attempt_200906192236_24166_m_000008_1, Status : FAILED > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying > to locate root region > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107) > at > com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449) > at > org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > attempt_200906192236_24166_m_000008_1: [2009-07-03 08:42:49.498] failed to > initialize the hbase configuration > 09/07/03 08:42:59 INFO mapred.JobClient: map 41% reduce 0% > 09/07/03 08:43:08 INFO mapred.JobClient: map 42% reduce 0% > 09/07/03 08:43:14 INFO mapred.JobClient: map 43% reduce 0% > 09/07/03 08:43:23 INFO mapred.JobClient: map 44% reduce 0% > 09/07/03 08:43:32 INFO mapred.JobClient: map 45% reduce 0% > 09/07/03 08:43:41 INFO mapred.JobClient: map 46% reduce 0% > 09/07/03 08:43:50 INFO mapred.JobClient: map 47% reduce 0% > 09/07/03 08:43:56 INFO mapred.JobClient: map 48% reduce 0% > 09/07/03 08:44:02 INFO mapred.JobClient: map 49% reduce 0% > 09/07/03 08:44:08 INFO mapred.JobClient: map 50% reduce 0% > 09/07/03 08:44:14 INFO mapred.JobClient: map 51% reduce 0% > 09/07/03 08:44:20 INFO mapred.JobClient: map 52% reduce 0% > 09/07/03 08:44:23 INFO mapred.JobClient: map 53% reduce 0% > 09/07/03 08:44:29 INFO mapred.JobClient: map 54% reduce 0% > 09/07/03 08:44:35 INFO mapred.JobClient: map 55% reduce 0% > 09/07/03 08:44:38 INFO mapred.JobClient: map 56% reduce 0% > 09/07/03 08:44:47 INFO mapred.JobClient: map 57% reduce 0% > 09/07/03 08:44:53 INFO mapred.JobClient: map 58% reduce 0% > 09/07/03 08:45:01 INFO mapred.JobClient: Task Id : > attempt_200906192236_24166_m_000007_2, Status : FAILED > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying > to locate root region > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107) > at > com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449) > at > org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > attempt_200906192236_24166_m_000007_2: [2009-07-03 08:44:55.897] failed to > initialize the hbase configuration > 09/07/03 08:45:01 INFO mapred.JobClient: Task Id : > attempt_200906192236_24166_m_000009_2, Status : FAILED > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying > to locate root region > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107) > at > com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449) > at > org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > attempt_200906192236_24166_m_000009_2: [2009-07-03 08:44:56.296] failed to > initialize the hbase configuration > 09/07/03 08:45:02 INFO mapred.JobClient: map 59% reduce 0% > 09/07/03 08:45:04 INFO mapred.JobClient: Task Id : > attempt_200906192236_24166_m_000008_2, Status : FAILED > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying > to locate root region > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107) > at > com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449) > at > org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > attempt_200906192236_24166_m_000008_2: [2009-07-03 08:44:59.221] failed to > initialize the hbase configuration > 09/07/03 08:45:08 INFO mapred.JobClient: map 60% reduce 0% > 09/07/03 08:45:17 INFO mapred.JobClient: map 61% reduce 0% > 09/07/03 08:45:26 INFO mapred.JobClient: map 62% reduce 0% > 09/07/03 08:45:32 INFO mapred.JobClient: map 63% reduce 0% > 09/07/03 08:45:38 INFO mapred.JobClient: map 64% reduce 0% > 09/07/03 08:45:44 INFO mapred.JobClient: map 65% reduce 0% > 09/07/03 08:45:50 INFO mapred.JobClient: map 66% reduce 0% > 09/07/03 08:45:56 INFO mapred.JobClient: map 67% reduce 0% > 09/07/03 08:46:02 INFO mapred.JobClient: map 68% reduce 0% > 09/07/03 08:46:08 INFO mapred.JobClient: map 69% reduce 0% > 09/07/03 08:46:15 INFO mapred.JobClient: map 70% reduce 0% > 09/07/03 08:46:21 INFO mapred.JobClient: map 71% reduce 0% > 09/07/03 08:46:27 INFO mapred.JobClient: map 72% reduce 0% > 09/07/03 08:46:36 INFO mapred.JobClient: map 73% reduce 0% > 09/07/03 08:46:45 INFO mapred.JobClient: map 74% reduce 0% > 09/07/03 08:46:54 INFO mapred.JobClient: map 75% reduce 0% > 09/07/03 08:47:03 INFO mapred.JobClient: map 76% reduce 0% > 09/07/03 08:47:12 INFO mapred.JobClient: map 77% reduce 0% > 09/07/03 08:47:18 INFO mapred.JobClient: map 78% reduce 0% > 09/07/03 08:47:24 INFO mapred.JobClient: map 79% reduce 0% > 09/07/03 08:47:33 INFO mapred.JobClient: map 80% reduce 0% > 09/07/03 08:47:42 INFO mapred.JobClient: map 81% reduce 0% > 09/07/03 08:47:51 INFO mapred.JobClient: map 82% reduce 0% > 09/07/03 08:48:00 INFO mapred.JobClient: map 83% reduce 0% > 09/07/03 08:48:09 INFO mapred.JobClient: map 84% reduce 0% > 09/07/03 08:48:15 INFO mapred.JobClient: map 85% reduce 0% > 09/07/03 08:48:24 INFO mapred.JobClient: map 86% reduce 0% > 09/07/03 08:48:30 INFO mapred.JobClient: map 87% reduce 0% > 09/07/03 08:48:39 INFO mapred.JobClient: map 88% reduce 0% > 09/07/03 08:48:54 INFO mapred.JobClient: map 89% reduce 0% > 09/07/03 08:49:06 INFO mapred.JobClient: map 90% reduce 0% > 09/07/03 08:49:15 INFO mapred.JobClient: map 91% reduce 0% > 09/07/03 08:49:24 INFO mapred.JobClient: map 92% reduce 0% > 09/07/03 08:49:30 INFO mapred.JobClient: map 93% reduce 0% > 09/07/03 08:49:36 INFO mapred.JobClient: map 94% reduce 0% > 09/07/03 08:49:45 INFO mapred.JobClient: map 95% reduce 0% > 09/07/03 08:49:57 INFO mapred.JobClient: map 96% reduce 0% > 09/07/03 08:50:08 INFO mapred.JobClient: map 97% reduce 0% > 09/07/03 08:50:17 INFO mapred.JobClient: map 98% reduce 0% > 09/07/03 08:50:26 INFO mapred.JobClient: map 99% reduce 0% > 09/07/03 08:50:35 INFO mapred.JobClient: map 100% reduce 0% > 09/07/03 08:50:40 INFO mapred.JobClient: Job complete: > job_200906192236_24166 > 09/07/03 08:50:40 INFO mapred.JobClient: Counters: 7 > 09/07/03 08:50:40 INFO mapred.JobClient: Job Counters > 09/07/03 08:50:40 INFO mapred.JobClient: Launched map tasks=19 > 09/07/03 08:50:40 INFO mapred.JobClient: Data-local map tasks=19 > 09/07/03 08:50:40 INFO mapred.JobClient: FileSystemCounters > 09/07/03 08:50:40 INFO mapred.JobClient: HDFS_BYTES_READ=57966580 > 09/07/03 08:50:40 INFO mapred.JobClient: Map-Reduce Framework > 09/07/03 08:50:40 INFO mapred.JobClient: Map input records=294786 > 09/07/03 08:50:40 INFO mapred.JobClient: Spilled Records=0 > 09/07/03 08:50:40 INFO mapred.JobClient: Map input bytes=57966580 > 09/07/03 08:50:40 INFO mapred.JobClient: Map output records=0 > > > ----- Original Message ----- > From: "stack" <st...@duboce.net> > To: hbase-dev@hadoop.apache.org > Sent: Thursday, July 2, 2009 6:12:29 PM GMT -05:00 US/Canada Eastern > Subject: Re: performance help > > Why 4 tables? Why not one table and four column families, one for each > metric? (Looking in excel spreadsheet, each row has same key). Then you'd > be doing one insert against a single table rather than four separate ones. > > Looking at your MR output below, it looks like it takes 40 seconds to > complete the map tasks. The report says that there 294786 inputs. Says > that the mapper outputs 17M records. Is that expected? > > A few of your reducers failed and were done over again. The redos were > probably significant part of the overall elapsed time. The failures are > trying to find root region. Root region is in zk. Odd it can't be found > there. > > The fetching of map data and sort is taking a considerable amount of the > overall time. Do you need to reduce step (Couldn't tell from the excel > spreadsheet -- there didn't seem to be any summing going on). If not, this > could make for savings too. > > You might try outputting to hdfs first to see how fast the job runs with no > hbase involved. See how long that takes. Tune this part of the job first. > Then add in hbase and see how much it slows things. > > Looking at your code, nothing obviously onerous. > > St.Ack > > > > > > On Thu, Jul 2, 2009 at 1:22 PM, Irfan Mohammed <irfan...@gmail.com> wrote: > > > Hi, > > > > Hbase/Hadoop Setup: > > 1. 3 regionservers > > 2. Run the task using 20 Map Tasks and 20 Reduce Tasks. > > 3. Using an older hbase version from the trunk [ Version: 0.20.0-dev, > > r786695, Sat Jun 20 18:01:17 EDT 2009 ] > > 4. Using hadoop [ 0.20.0 ] > > > > Test Data: > > 1. The input is a CSV file with a 1M rows and about 20 columns and 4 > > metrics. > > 2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4". > > > > The task is to parse through the CSV file and for each metric m1 create > an > > entry into the hbase table "txn_m1" with the columns as needed. Attached > is > > an pdf [from an excel] which explains how a single row in the CSV is > > converted into hbase data in the mapper and reducer stage. Attached is > the > > code as well. > > > > For processing a 1M records, it is taking about 38 minutes. I am using > > HTable.incrementColumnValue() in the reduce pass to create the records in > > the hbase tables. > > > > Is there anything I should be doing differently or inherently incorrect? > I > > would like run this task in 1 minute. > > > > Thanks for the help, > > Irfan > > > > Here is the output of the process. Let me know if I should attach any > other > > log. > > > > 09/07/02 15:19:11 INFO mapred.JobClient: Running job: > job_200906192236_5114 > > 09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0% > > 09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0% > > 09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0% > > 09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0% > > 09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0% > > 09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0% > > 09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1% > > 09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3% > > 09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4% > > 09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10% > > 09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12% > > 09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16% > > 09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25% > > 09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33% > > 09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36% > > 09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39% > > 09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41% > > 09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42% > > 09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44% > > 09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46% > > 09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47% > > 09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50% > > 09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51% > > 09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52% > > 09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53% > > 09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54% > > 09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55% > > 09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56% > > 09/07/02 15:23:02 INFO mapred.JobClient: Task Id : > > attempt_200906192236_5114_r_000002_0, Status : FAILED > > attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230] fetching > > new record writer ... > > attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429] failed to > > initialize the hbase configuration > > 09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53% > > 09/07/02 15:23:08 INFO mapred.JobClient: Task Id : > > attempt_200906192236_5114_r_000013_0, Status : FAILED > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying > > to locate root region > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490) > > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124) > > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107) > > at > > > com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442) > > at > > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413) > > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > > > attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183] fetching > > new record writer ... > > attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369] failed to > > initialize the hbase configuration > > 09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50% > > 09/07/02 15:23:14 INFO mapred.JobClient: Task Id : > > attempt_200906192236_5114_r_000012_0, Status : FAILED > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying > > to locate root region > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490) > > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124) > > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107) > > at > > > com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442) > > at > > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413) > > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > > > attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434] fetching > > new record writer ... > > attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185] failed to > > initialize the hbase configuration > > 09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48% > > 09/07/02 15:23:17 INFO mapred.JobClient: Task Id : > > attempt_200906192236_5114_r_000014_0, Status : FAILED > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying > > to locate root region > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527) > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490) > > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124) > > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107) > > at > > > com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442) > > at > > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413) > > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > > > attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442] fetching > > new record writer ... > > attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285] failed to > > initialize the hbase configuration > > 09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45% > > 09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46% > > 09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47% > > 09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48% > > 09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49% > > 09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51% > > 09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56% > > 09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58% > > 09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59% > > 09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60% > > 09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61% > > 09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62% > > 09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64% > > 09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65% > > 09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66% > > 09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67% > > 09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68% > > 09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69% > > 09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70% > > 09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71% > > 09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72% > > 09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73% > > 09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74% > > 09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76% > > 09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77% > > 09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78% > > 09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79% > > 09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81% > > 09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82% > > 09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83% > > 09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84% > > 09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86% > > 09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88% > > 09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89% > > 09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90% > > 09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91% > > 09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93% > > 09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94% > > 09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95% > > 09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96% > > 09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98% > > 09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99% > > 09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100% > > 09/07/02 15:57:14 INFO mapred.JobClient: Job complete: > > job_200906192236_5114 > > 09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18 > > 09/07/02 15:57:14 INFO mapred.JobClient: Job Counters > > 09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24 > > 09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2 > > 09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20 > > 09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18 > > 09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters > > 09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_READ=1848609562 > > 09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980 > > 09/07/02 15:57:14 INFO mapred.JobClient: > FILE_BYTES_WRITTEN=2768325646 > > 09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework > > 09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863 > > 09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0 > > 09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786 > > 09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle > bytes=883803390 > > 09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0 > > 09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464 > > 09/07/02 15:57:14 INFO mapred.JobClient: Map output bytes=888797024 > > 09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580 > > 09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0 > > 09/07/02 15:57:14 INFO mapred.JobClient: Map output records=16985488 > > 09/07/02 15:57:14 INFO mapred.JobClient: Reduce input > records=16985488 > > > > >