Where did he mention he was attempting to bond the ports? Sorry if I missed it?
On Jan 7, 2013, at 7:37 AM, Doug Meil <[email protected]> wrote: > > Hi there, > > The HBase RefGuide has a comprehensive case study on such a case. This > might not be the exact problem, but the diagnostic approach should help. > > http://hbase.apache.org/book.html#casestudies.slownode > > > > > > On 1/4/13 10:37 PM, "Liu, Raymond" <[email protected]> wrote: > >> Hi >> >> I encounter a weird lag behind map task issue here : >> >> I have a small hadoop/hbase cluster with 1 master node and 4 regionserver >> node all have 16 CPU with map and reduce slot set to 24. >> >> A few table is created with regions distributed on each region node >> evenly ( say 16 region for each region server). Also each region has >> almost the same number of kvs with very similar size. All table had >> major_compact done to ensure data locality >> >> I have a MR job which simply do local region scan in every map task ( so >> 16 map task for each regionserver node). >> >> By theory, every map task should finish within similar time. >> >> But the real case is that some regions on the same region server always >> lags behind a lot, say cost 150 ~250% of the other map tasks average >> times. >> >> If this is happen to a single region server for every table, I might >> doubt it is a disk issue or other reason that bring down the performance >> of this region server. >> >> But the weird thing is that, though with each single table, almost all >> the map task on the the same single regionserver is lag behind. But for >> different table, this lag behind regionserver is different! And the >> region and region size is distributed evenly which I double checked for a >> lot of times. ( I even try to set replica to 4 to ensure every node have >> a copy of local data) >> >> Say table 1, all map task on regionserver node 2 is slow. While for table >> 2, maybe all map task on regionserver node 3 is slow, and with table 1, >> it will always be regionserver node 2 which is slow regardless of cluster >> restart, and the slowest map task will always be the very same one. And >> it won't go away even I do major compact again..... >> >> So, anyone could give me some clue on what reason might possible lead to >> this weird behavior? Any wild guess is welcome! >> >> (BTW. I don't encounter this issue a few days ago with the same table. >> While I do restart cluster and do a few changes upon config file during >> that period, But restore the config file don't help) >> >> >> Best Regards, >> Raymond Liu >> >> > > >
