Hi Ted Thanks for your reply
> > Did you use TableInputFormat in your MR job ? No, a custom one which do the same split work, but input for each map task is the split, and the map task open htable and read the specific region by itself. > Did you use the one from mapred or mapreduce ? All related staff from mapreduce. > > What version of HBase are you using ? 0.94.1 > > Did you take a look at Ganglia to see if there is any bottleneck in your > cluster ? I don't, but I do check cpu and disk usage simply by dstat -cdnm , no cpu or disk or network IO bottle neck is observed. > > You mentioned a few changes upon config file shortly before this problem > appeared, can you let us know which parameters you modified ? Mainly increase dfs.datanode.handler.count / hbase.regionserver.handler.count from default to around 30 etc. while this is done on every node. And I change it back later. Hmm... > > Cheers > > On Fri, Jan 4, 2013 at 7:37 PM, Liu, Raymond <[email protected]> wrote: > > > Hi > > > > I encounter a weird lag behind map task issue here : > > > > I have a small hadoop/hbase cluster with 1 master node and 4 > > regionserver node all have 16 CPU with map and reduce slot set to 24. > > > > A few table is created with regions distributed on each region node > > evenly ( say 16 region for each region server). Also each region has > > almost the same number of kvs with very similar size. All table had > > major_compact done to ensure data locality > > > > I have a MR job which simply do local region scan in every map task ( > > so > > 16 map task for each regionserver node). > > > > By theory, every map task should finish within similar time. > > > > But the real case is that some regions on the same region server > > always lags behind a lot, say cost 150 ~250% of the other map tasks average > times. > > > > If this is happen to a single region server for every table, I might > > doubt it is a disk issue or other reason that bring down the > > performance of this region server. > > > > But the weird thing is that, though with each single table, almost all > > the map task on the the same single regionserver is lag behind. But > > for different table, this lag behind regionserver is different! And > > the region and region size is distributed evenly which I double > > checked for a lot of times. ( I even try to set replica to 4 to ensure > > every node have a copy of local data) > > > > Say table 1, all map task on regionserver node 2 is slow. While for > > table 2, maybe all map task on regionserver node 3 is slow, and with > > table 1, it will always be regionserver node 2 which is slow > > regardless of cluster restart, and the slowest map task will always be > > the very same one. And it won't go away even I do major compact again..... > > > > So, anyone could give me some clue on what reason might possible lead > > to this weird behavior? Any wild guess is welcome! > > > > (BTW. I don't encounter this issue a few days ago with the same table. > > While I do restart cluster and do a few changes upon config file > > during that period, But restore the config file don't help) > > > > > > Best Regards, > > Raymond Liu > > > >
