greetings, while I was importing data into my HBase Cluster, I found one regionserver is down, and by check the log, I found following exceptoin: *EOFException*(during HBase flush memstore to HDFS file? not sure)
seems that it's caused by DFSClient not working, I don't know the exact reason, maybe it's caused by the heavy load on the machine where the datanode is residing on, or the disk is full. but I am not sure which DFS node should I check. has anybody met the same problem? any pointer or hint is appreciated. The log is as follows: 2010-04-06 03:04:34,065 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 20 on 60020' on region hbt2table16,,1270522012397: memstore size 128.0m is >= than blocking 128.0m size 2010-04-06 03:04:34,712 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 34; new storefile is hdfs://rra-03:8887hbase/hbt2table16/2144402082/34/854678344516838047; store size is 2.9m 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of 35: 2.9m; Skipped 0 file(s), size: 0 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 5 file(s) into hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737 2010-04-06 03:04:35,055 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://rra-03:8887hbase/hbt2table16/2144402082/184/1530971405029654438, entries=1489, sequenceid=2914917785, memsize=203.8k, filesize=88.6k to hbt2table16,,1270522012397 2010-04-06 03:04:35,442 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 35; new storefile is hdfs://rra-03:8887hbase/hbt2table16/2144402082/35/2952180521700205032; store size is 2.9m 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of 36: 2.9m; Skipped 0 file(s), size: 0 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 file(s) into hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737 2010-04-06 03:04:35,469 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://rra-03:8887hbase/hbt2table16/2144402082/185/1984548574711437130, entries=2105, sequenceid=2914917785, memsize=286.7k, filesize=123.9k to hbt2table16,,1270522012397 2010-04-06 03:04:35,711 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://rra-03:8887hbase/hbt2table16/2144402082/186/2470661482474884005, entries=3031, sequenceid=2914917785, memsize=414.0k, filesize=179.1k to hbt2table16,,1270522012397 2010-04-06 03:04:35,866 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started. Attempting to free 20853136 bytes 2010-04-06 03:04:37,010 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed. Freed 20866928 bytes. Priority Sizes: Single=17.422821MB (18269152), Multi=150.70126MB (158021728),Memory=0.0MB (0) 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901 2010-04-06 03:04:37,607 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 36; new storefile is hdfs://rra-03:8887hbase/hbt2table16/2144402082/36/1570089400510240916; store size is 2.9m 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of 37: 2.9m; Skipped 0 file(s), size: 0 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 file(s) into hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.*EOFException* 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_2467598422201289982_1391902 2010-04-06 03:04:43,568 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2010-04-06 03:04:43,568 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-2065206049437531800_1391902 2010-04-06 03:04:44,044 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2010-04-06 03:04:44,044 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-3059563223628992257_1391902 2010-04-06 03:05:01,588 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2814) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2078) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2264) 2010-04-06 03:05:01,588 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-3320281088550177280_1391903 bad datanode[0] nodes == null 2010-04-06 03:05:01,589 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "hbase/hbt2table16/2144402082/187/6358539637638901699" - Aborting... 2010-04-06 03:05:01,589 FATAL org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: hbt2table16,,1270522012397 at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:977) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:846) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:241) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:149) Caused by: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2870) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2795) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2078) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2264) 2010-04-06 03:05:01,603 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=0.0, regions=335, stores=590, storefiles=1231, storefileIndexSize=83, memstoreSize=128, compactionQueueSize=1, usedHeap=710, maxHeap=993, blockCacheSize=162178088, blockCacheFree=46200184, blockCacheCount=2483, blockCacheHitRatio=2, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0 2010-04-06 03:05:01,604 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver/10.76.112.214:60020.cacheFlusher exiting 2010-04-06 03:05:01,673 INFO org.apache.hadoop.hbase.regionserver.HLog: Roll hbase/.logs/rrb-08.off.tn.ask.com,60020,1268973923999/hlog.dat.1270523052543, entries=483321, calcsize=88970157, filesize=61838598. New hlog hbase/.logs/ rrb-08.off.tn.ask.com,60020,126897392