I'm experiencing a scenario whereby long-running MapReduce tasks appear to
be taking RegionServers down, wondering if
  1) anyone else is experiencing this
  2) this is expected behaviour, and if so,
  3) what I can do about it.

Scenario looks something like this (although I may be misinterpreting):

1.TableMapper task has not completed after about one minute

2. RegionServer reports:
  2009-11-26 01:53:09,867 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
637450415407903410 lease expired
  2009-11-26 01:53:09,902 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
-2878823706426617596 lease expired
  2009-11-26 01:53:09,902 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
-4783927291358135864 lease expired
  2009-11-26 01:53:09,902 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
5282836853577726911 lease expired

Followed By:
  2009-11-26 01:53:09,882 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x252df4a91a0005 to sun.nio.ch.selectionkeyi...@49e808ca
    java.io.IOException: TIMED OUT
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
  2009-11-26 01:53:09,867 WARN org.apache.hadoop.hbase.util.Sleeper: We
slept 72462ms, ten times longer than scheduled: 3000
  2009-11-26 01:53:09,918 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer:
    org.apache.hadoop.hbase.UnknownScannerException: Name:
5282836853577726911
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1925)

3. Task Fails. (normal behaviour)

4. RegionServer starts closing its regions:
  2009-11-26 01:53:10,035 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: Disconnected, type: None, path: null
  2009-11-26 01:53:10,104 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_CALL_SERVER_STARTUP
  2009-11-26 01:53:10,187 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed crawled_pages, ...
  2009-11-26 01:53:11,653 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed crawled_pages, ...
  2009-11-26 01:53:14,184 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed crawled_pages, ...
  2009-11-26 01:53:18,567 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed crawled_pages, ...
  2009-11-26 01:53:19,147 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed crawled_pages, ...
  2009-11-26 01:53:21,265 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed crawled_pages, ...
  2009-11-26 01:53:45,112 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed crawled_pages, ...

5. RegionServer shuts down:
  2009-11-26 01:53:12,425 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 60020
  2009-11-26 01:53:12,425 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 60020: exiting
  2009-11-26 01:53:12,425 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020: exiting
  2009-11-26 01:53:12,425 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 60020: exiting
  2009-11-26 01:53:12,443 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
  2009-11-26 01:53:12,526 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC Server Responder
  2009-11-26 01:53:13,169 INFO
org.apache.hadoop.hbase.regionserver.LogFlusher:
regionserver/192.168.1.111:60020.logFlusher exiting
  2009-11-26 01:53:13,170 INFO
org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.
  2009-11-26 01:53:13,170 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
regionserver/192.168.1.111:60020.cacheFlusher exiting
  2009-11-26 01:53:13,170 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker:
regionserver/192.168.1.111:60020.majorCompactionChecker exiting
  2009-11-26 01:53:13,169 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver/192.168.1.111:60020.compactor exiting
  2009-11-26 01:53:13,449 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: On abort, closed hlog
  2009-11-26 01:53:13,450 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
192.168.1.111:60020
  2009-11-26 01:53:19,867 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
  2009-11-26 01:53:19,867 INFO org.apache.zookeeper.ZooKeeper: Closing
session: 0x252df4a91a0005
  2009-11-26 01:53:19,867 INFO org.apache.zookeeper.ClientCnxn: Closing
ClientCnxn for session: 0x252df4a91a0005
  2009-11-26 01:53:19,867 INFO org.apache.zookeeper.ClientCnxn:
Disconnecting ClientCnxn for session: 0x252df4a91a0005
  2009-11-26 01:53:19,868 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
  2009-11-26 01:53:19,868 INFO org.apache.zookeeper.ZooKeeper: Session:
0x252df4a91a0005 closed
  2009-11-26 01:53:19,902 INFO org.apache.hadoop.hbase.Leases:
regionserver/192.168.1.111:60020.leaseChecker closing leases
  2009-11-26 01:53:19,903 INFO org.apache.hadoop.hbase.Leases:
regionserver/192.168.1.111:60020.leaseChecker closed leases
  2009-11-26 01:53:19,969 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
192.168.1.111:60020 exiting
  2009-11-26 01:53:19,992 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
thread.
  2009-11-26 01:53:19,992 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete

Joost

Reply via email to