Did you use TableInputFormat in your MR job ? Did you use the one from mapred or mapreduce ?
What version of HBase are you using ? Did you take a look at Ganglia to see if there is any bottleneck in your cluster ? You mentioned a few changes upon config file shortly before this problem appeared, can you let us know which parameters you modified ? Cheers On Fri, Jan 4, 2013 at 7:37 PM, Liu, Raymond <[email protected]> wrote: > Hi > > I encounter a weird lag behind map task issue here : > > I have a small hadoop/hbase cluster with 1 master node and 4 regionserver > node all have 16 CPU with map and reduce slot set to 24. > > A few table is created with regions distributed on each region node evenly > ( say 16 region for each region server). Also each region has almost the > same number of kvs with very similar size. All table had major_compact done > to ensure data locality > > I have a MR job which simply do local region scan in every map task ( so > 16 map task for each regionserver node). > > By theory, every map task should finish within similar time. > > But the real case is that some regions on the same region server always > lags behind a lot, say cost 150 ~250% of the other map tasks average times. > > If this is happen to a single region server for every table, I might doubt > it is a disk issue or other reason that bring down the performance of this > region server. > > But the weird thing is that, though with each single table, almost all the > map task on the the same single regionserver is lag behind. But for > different table, this lag behind regionserver is different! And the region > and region size is distributed evenly which I double checked for a lot of > times. ( I even try to set replica to 4 to ensure every node have a copy of > local data) > > Say table 1, all map task on regionserver node 2 is slow. While for table > 2, maybe all map task on regionserver node 3 is slow, and with table 1, it > will always be regionserver node 2 which is slow regardless of cluster > restart, and the slowest map task will always be the very same one. And it > won't go away even I do major compact again..... > > So, anyone could give me some clue on what reason might possible lead to > this weird behavior? Any wild guess is welcome! > > (BTW. I don't encounter this issue a few days ago with the same table. > While I do restart cluster and do a few changes upon config file during > that period, But restore the config file don't help) > > > Best Regards, > Raymond Liu > >
