Edward J. Yoon wrote: > During write operation in reduce phase, region servers are killed. > (64,000 rows with 10,000 columns, 3 node)
10k columns is probably over what hbase is currently able to do (hbase-867). You've seen the notes at end of the http://wiki.apache.org/hadoop/Hbase/Troubleshooting page? See other notes below: > ---- > 09/01/14 13:07:59 INFO mapred.JobClient: map 100% reduce 36% > 09/01/14 13:11:38 INFO mapred.JobClient: map 100% reduce 33% > 09/01/14 13:11:38 INFO mapred.JobClient: Task Id : > attempt_200901140952_0010_r_000017_1, Status : FAILED > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to > contact region server 61.247.201.163:60020 for region > DenseMatrix_randgnegu,,1231905480938, row '000000000000287', but > failed after 10 attempts. > Exceptions: > java.io.IOException: java.io.IOException: Server not running, aborting > at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2103) > at org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1611) > ---- > You upped the hbase client timeouts? > And, I can't stop the hbase. > > [d8g053:/root]# hbase-trunk/bin/stop-hbase.sh > stopping master............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... > > Can it be recovered? What does master log say? Why ain't it going down? On tail of the log it'll usually say why its staying up. Probably a particular HRegionServer? > > ---- > Region server log: > > 2009-01-14 13:03:56,591 WARN org.apache.hadoop.hdfs.DFSClient: > DataStreamer Exception: java.io.IOException: Unable to create new > block. > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) These look like issue that config. on the troubleshooting page might address (check your datanode logs). You are using 0.18.0 hbase? St.Ack On Tue, Jan 13, 2009 at 8:42 PM, Edward J. Yoon <[email protected]>wrote: > During write operation in reduce phase, region servers are killed. > (64,000 rows with 10,000 columns, 3 node) > > ---- > 09/01/14 13:07:59 INFO mapred.JobClient: map 100% reduce 36% > 09/01/14 13:11:38 INFO mapred.JobClient: map 100% reduce 33% > 09/01/14 13:11:38 INFO mapred.JobClient: Task Id : > attempt_200901140952_0010_r_000017_1, Status : FAILED > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to > contact region server 61.247.201.163:60020 for region > DenseMatrix_randgnegu,,1231905480938, row '000000000000287', but > failed after 10 attempts. > Exceptions: > java.io.IOException: java.io.IOException: Server not running, aborting > at > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2103) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1611) > ---- > > And, I can't stop the hbase. > > [d8g053:/root]# hbase-trunk/bin/stop-hbase.sh > stopping > master............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... > > Can it be recovered? > > ---- > Region server log: > > 2009-01-14 13:03:56,591 WARN org.apache.hadoop.hdfs.DFSClient: > DataStreamer Exception: java.io.IOException: Unable to create new > block. > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) > 2009-01-14 13:03:56,591 WARN org.apache.hadoop.hdfs.DFSClient: Error > Recovery for block blk_-4005955194083205373_14543 bad datanode[0] > nodes == null > 2009-01-14 13:03:56,591 WARN org.apache.hadoop.hdfs.DFSClient: Could > not get block locations. Aborting... > 2009-01-14 13:03:56,629 ERROR > org.apache.hadoop.hbase.regionserver.CompactSplitThread: > Compaction/Split failed for region > DenseMatrix_randllnma,000000000000,18,7-29116,1231898419257 > java.io.IOException: Could not read from stream > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119) > at java.io.DataInputStream.readByte(DataInputStream.java:248) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325) > at > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346) > at org.apache.hadoop.io.Text.readString(Text.java:400) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) > 2009-01-14 13:03:56,631 INFO > org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on > region DenseMatrix_randllnma,00000000000,16,19-26373,1231898311583 > 2009-01-14 13:03:56,692 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new decompressor > 2009-01-14 13:03:56,692 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new decompressor > 2009-01-14 13:03:56,693 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new decompressor > 2009-01-14 13:03:56,693 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new decompressor > 2009-01-14 13:03:57,521 INFO org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor > 2009-01-14 13:03:57,810 INFO org.apache.hadoop.hdfs.DFSClient: > Exception in createBlockOutputStream java.io.IOException: Could not > read from stream > 2009-01-14 13:03:57,810 INFO org.apache.hadoop.hdfs.DFSClient: > Abandoning block blk_-2612702056484946948_14554 > 2009-01-14 13:03:59,343 WARN org.apache.hadoop.hdfs.DFSClient: > DataStreamer Exception: java.io.IOException: Unable to create new > block. > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) > > 2009-01-14 13:03:59,344 WARN org.apache.hadoop.hdfs.DFSClient: Error > Recovery for block blk_-5255885897790790367_14543 bad datanode[0] > nodes == null > 2009-01-14 13:03:59,344 WARN org.apache.hadoop.hdfs.DFSClient: Could > not get block locations. Aborting... > 2009-01-14 13:03:59,344 FATAL > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog > required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > DenseMatrix_randgnegu,,1231905480938 > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:896) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789) > at > org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227) > at > org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:137) > Caused by: java.io.IOException: Could not read from stream > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119) > at java.io.DataInputStream.readByte(DataInputStream.java:248) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325) > at > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346) > at org.apache.hadoop.io.Text.readString(Text.java:400) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) > 2009-01-14 13:03:59,359 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: > request=15, regions=48, stores=192, storefiles=756, > storefileIndexSize=6, memcacheSize=338, usedHeap=395, maxHeap=971 > 2009-01-14 13:03:59,359 INFO > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: > regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher exiting > 2009-01-14 13:03:59,368 INFO > org.apache.hadoop.hbase.regionserver.HLog: Closed > hdfs:// > dev3.nm2.naver.com:9000/hbase/log_61.247.201.165_1231894400437_60020/hlog.dat.1231905813472 > , > entries=896500. New log writer: > /hbase/log_61.247.201.165_1231894400437_60020/hlog.dat.1231905839367 > > 2009-01-14 13:03:59,368 INFO > org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. > > > > -- > Best Regards, Edward J. Yoon @ NHN, corp. > [email protected] > http://blog.udanax.org >
