Is the patch for HADOOP- 4681 applied? See https://issues.apache.org/jira/browse/HADOOP-4681
- Andy ________________________________ From: llpind <[email protected]> To: [email protected] Sent: Tuesday, June 2, 2009 1:00:24 PM Subject: Re: HBase looses regions. Here are the changes I've made. - Moved to Gigabit ethernet - upgraded to HBase 0.19.3 - Run upload program from master box (instead of local) - Added 5 more nodes (Now w/8 region servers and 1 master. same boxes hold hadoop datanodes). The upload got much further (45M+ of 58M), but I still lost one region server between 45M-46M. here is the HBase log from that region server: ===================================================== 2009-06-02 11:21:24,334 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region tableA,row_keyA,1243964035709 2009-06-02 11:21:24,336 DEBUG org.apache.hadoop.hbase.regionserver.HStore: 1008565632/entity: no store files to compact 2009-06-02 11:21:24,346 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Compaction size of 1008565632/link: 361.0m; Skipped 2 file(s), size: 337039946 2009-06-02 11:21:24,392 INFO org.apache.hadoop.hbase.regionserver.HLog: Closed hdfs://ats181:54310/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966872067, entries=115001. New log writer: /hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966884387 2009-06-02 11:21:24,392 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Last sequence written is empty. Deleting all old hlogs 2009-06-02 11:21:24,392 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old log file /hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966854582 whose highest sequence/edit id is 26714173 2009-06-02 11:21:24,395 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old log file /hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966859798 whose highest sequence/edit id is 26819173 2009-06-02 11:21:24,401 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old log file /hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966867599 whose highest sequence/edit id is 26924173 2009-06-02 11:21:24,404 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old log file /hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966872067 whose highest sequence/edit id is 27039174 2009-06-02 11:21:24,412 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Started compaction of 2 file(s) into /hbase/tableA/compaction.dir/1008565632/link/mapfiles/1956913417104407709 2009-06-02 11:21:37,145 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException: 10000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.175:36139 remote=/192.168.0.175:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209) 2009-06-02 11:21:37,146 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 1 on 60020 took 10003ms appending an edit to HLog; editcount=41936 2009-06-02 11:21:37,146 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_7404574899445503675_2363 bad datanode[0] 192.168.0.175:50010 2009-06-02 11:21:37,147 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_7404574899445503675_2363 in pipeline 192.168.0.175:50010, 192.168.0.194:50010: bad datanode 192.168.0.175:50010 2009-06-02 11:21:37,405 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException: 10000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.175:36141 remote=/192.168.0.175:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209) 2009-06-02 11:21:37,406 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-1937654560341865886_2364 bad datanode[0] 192.168.0.175:50010 2009-06-02 11:21:37,406 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-1937654560341865886_2364 in pipeline 192.168.0.175:50010, 192.168.0.195:50010: bad datanode 192.168.0.175:50010 2009-06-02 11:21:40,754 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 1 on 60020 took 3608ms appending an edit to HLog; editcount=41940 2009-06-02 11:21:40,806 WARN org.apache.hadoop.hbase.regionserver.HLog: regionserver/0.0.0.0:60020.logFlusher took 11285ms optional sync to HLog; editcount=45000 2009-06-02 11:21:44,692 INFO org.apache.hadoop.hbase.regionserver.HLog: Closed hdfs://ats181:54310/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966878324, entries=105000. New log writer: /hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966904686 2009-06-02 11:21:44,692 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Found 0 logs to remove out of total 1; oldest outstanding seqnum is 27039175 from region tableA,row_keyA,1243964035709 2009-06-02 11:21:47,133 DEBUG org.apache.hadoop.hbase.regionserver.HStore: moving /hbase/tableA/compaction.dir/1008565632/link/mapfiles/1956913417104407709 to /hbase/tableA/1008565632/link/mapfiles/8666639134401806411 2009-06-02 11:21:47,228 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Completed compaction of 1008565632/link store size is 360.5m 2009-06-02 11:21:47,244 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region tableA,row_keyA,1243964035709 in 22sec 2009-06-02 11:21:47,276 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Startkey and midkey are same, not splitting 2009-06-02 11:21:47,133 DEBUG org.apache.hadoop.hbase.regionserver.HStore: moving /hbase/tableA/compaction.dir/1008565632/link/mapfiles/1956913417104407709 to /hbase/tableA/1008565632/link/mapfiles/8666639134401806411 2009-06-02 11:21:47,228 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Completed compaction of 1008565632/link store size is 360.5m 2009-06-02 11:21:47,244 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region tableA,row_keyA,1243964035709 in 22sec 2009-06-02 11:21:47,276 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Startkey and midkey are same, not splitting 2009-06-02 11:21:50,460 INFO org.apache.hadoop.hbase.regionserver.HLog: Closed hdfs://ats181:54310/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966884387, entries=105000. New log writer: /hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966910456 2009-06-02 11:21:50,460 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Found 0 logs to remove out of total 2; oldest outstanding seqnum is 27039175 from region tableA,row_keyA,1243964035709 2009-06-02 11:21:54,857 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on tableA,row_keyA,1243964035709 2009-06-02 11:21:54,857 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region tableA,row_keyA,1243964035709. Current region memcache size 64.5m 2009-06-02 11:22:05,786 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException: 10000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.175:36162 remote=/192.168.0.175:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209) 2009-06-02 11:22:05,787 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_7847038903223219269_2371 bad datanode[0] 192.168.0.175:50010 2009-06-02 11:22:05,787 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_7847038903223219269_2371 in pipeline 192.168.0.175:50010, 192.168.0.179:50010: bad datanode 192.168.0.175:50010 2009-06-02 11:22:12,025 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 3 on 60020' on region tableA,row_keyA,1243964035709: Memcache size 128.2m is >= than blocking 128.0m size 2009-06-02 11:22:52,772 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_7847038903223219269_2371 failed because recovery from primary datanode 192.168.0.179:50010 failed 1 times. Will retry... 2009-06-02 11:22:52,776 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_7847038903223219269_2371 bad datanode[0] 192.168.0.175:50010 2009-06-02 11:22:52,776 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_7847038903223219269_2371 in pipeline 192.168.0.175:50010, 192.168.0.179:50010: bad datanode 192.168.0.175:50010 2009-06-02 11:22:58,311 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException: 5000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.175:58191 remote=/192.168.0.179:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2209) 2009-06-02 11:22:58,311 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_7847038903223219269_2373 bad datanode[0] 192.168.0.179:50010 2009-06-02 11:22:58,351 FATAL org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tableA,row_keyA,1243964035709 at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:903) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:796) at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:265) at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:148) Caused by: java.io.IOException: All datanodes 192.168.0.179:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2444) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1996) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:1996) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) 2009-06-02 11:22:58,354 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=0.0, regions=1, stores=2, storefiles=3, storefileIndexSize=0, memcacheSize=128, usedHeap=172, maxHeap=2999 2009-06-02 11:22:58,355 INFO org.apache.hadoop.hbase.regionserver.MemcacheFlusher: regionserver/0.0.0.0:60020.cacheFlusher exiting 2009-06-02 11:22:58,370 INFO org.apache.hadoop.hbase.regionserver.HLog: Closed hdfs://ats181:54310/hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966904686, entries=440000. New log writer: /hbase/log_192.168.0.175_1243890243326_60020/hlog.dat.1243966978364 2009-06-02 11:22:58,370 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Found 0 logs to remove out of total 3; oldest outstanding seqnum is 27039175 from region tableA,row_keyA,1243964035709 2009-06-02 11:22:58,370 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2009-06-02 11:23:00,839 INFO org.apache.hadoop.hbase.regionserver.LogFlusher: regionserver/0.0.0.0:60020.logFlusher exiting 2009-06-02 11:23:01,347 DEBUG org.apache.hadoop.hbase.RegionHistorian: Offlined 2009-06-02 11:23:01,347 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020 2009-06-02 11:23:01,348 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on tableA,row_keyA,1243964035709 2009-06-02 11:23:01,348 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer 2009-06-02 11:23:01,348 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 60020 2009-06-02 11:23:01,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60020: exiting 2009-06-02 11:23:01,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60020: exiting 2009-06-02 11:23:01,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60020: exiting 2009-06-02 11:23:01,348 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030] 2009-06-02 11:23:01,350 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder 2009-06-02 11:23:01,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020: exiting 2009-06-02 11:23:01,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60020: exiting 2009-06-02 11:23:01,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60020: exiting 2009-06-02 11:23:01,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60020: exiting 2009-06-02 11:23:01,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020: exiting 2009-06-02 11:23:01,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020: exiting 2009-06-02 11:23:01,355 INFO org.mortbay.http.SocketListener: Stopped SocketListener on 0.0.0.0:60030 2009-06-02 11:23:02,531 INFO org.mortbay.util.Container: Stopped HttpContext[/logs,/logs] 2009-06-02 11:23:02,532 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.servlet.webapplicationhand...@79884a40 2009-06-02 11:23:03,294 INFO org.mortbay.util.Container: Stopped WebApplicationContext[/static,/static] 2009-06-02 11:23:03,295 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.servlet.webapplicationhand...@794e113b 2009-06-02 11:23:04,438 INFO org.mortbay.util.Container: Stopped WebApplicationContext[/,/] 2009-06-02 11:23:04,438 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.ser...@4e4ee70b 2009-06-02 11:23:04,438 DEBUG org.apache.hadoop.hbase.regionserver.HLog: closing log writer in hdfs://ats181:54310/hbase/log_192.168.0.175_1243890243326_60020 2009-06-02 11:23:04,439 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: regionserver/0.0.0.0:60020.compactor exiting 2009-06-02 11:23:04,439 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker: regionserver/0.0.0.0:60020.majorCompactionChecker exiting 2009-06-02 11:23:04,814 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: On abort, closed hlog 2009-06-02 11:23:04,815 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: closing region tableA,row_keyA,1243964035709 2009-06-02 11:23:04,815 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing tableA,row_keyA,1243964035709: compactions & flushes disabled 2009-06-02 11:23:04,815 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region, no outstanding scanners on tableA,row_keyA,1243964035709 2009-06-02 11:23:04,815 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No more row locks outstanding on region tableA,row_keyA,1243964035709 2009-06-02 11:23:04,815 DEBUG org.apache.hadoop.hbase.regionserver.HStore: closed 1008565632/entity 2009-06-02 11:23:04,815 DEBUG org.apache.hadoop.hbase.regionserver.HStore: closed 1008565632/link 2009-06-02 11:23:04,815 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed tableA,row_keyA,1243964035709 2009-06-02 11:23:05,362 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at: 192.168.0.175:60020 2009-06-02 11:23:07,463 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting 2009-06-02 11:23:07,567 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/0.0.0.0:60020 exiting 2009-06-02 11:23:08,511 INFO org.apache.hadoop.hbase.Leases: regionserver/0.0.0.0:60020.leaseChecker closing leases 2009-06-02 11:23:08,511 INFO org.apache.hadoop.hbase.Leases: regionserver/0.0.0.0:60020.leaseChecker closed leases 2009-06-02 11:23:08,512 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread. 2009-06-02 11:23:08,513 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete ===================================================== Here is the log from the datanode: ===================================================== 2009-06-02 12:14:27,084 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.193:59337, dest: /192.168.0.175:50010, bytes: 9, op: HDFS_WRITE, cliID: DFSClient_632050003, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_-1486458067537042098_2495 2009-06-02 12:14:27,085 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-1486458067537042098_2495 terminating 2009-06-02 12:14:34,702 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-428154950044527208_2466 file /data/hadoop-datastore/current/blk_-428154950044527208 2009-06-02 12:14:40,271 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-3632713526855093302_2498 src: /192.168.0.193:59345 dest: /192.168.0.175:50010 2009-06-02 12:14:41,230 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.193:59345, dest: /192.168.0.175:50010, bytes: 20739360, op: HDFS_WRITE, cliID: DFSClient_632050003, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_-3632713526855093302_2498 2009-06-02 12:14:41,231 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-3632713526855093302_2498 terminating 2009-06-02 12:14:41,484 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_2475895093255279384_2500 src: /192.168.0.193:59357 dest: /192.168.0.175:50010 2009-06-02 12:14:43,727 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-5066455597866254073_2486 file /data/hadoop-datastore/current/blk_-5066455597866254073 2009-06-02 12:14:54,199 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.193:59357, dest: /192.168.0.175:50010, bytes: 10893600, op: HDFS_WRITE, cliID: DFSClient_632050003, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_2475895093255279384_2500 2009-06-02 12:14:54,200 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_2475895093255279384_2500 terminating 2009-06-02 12:15:05,863 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_3906931575379674856_2506 src: /192.168.0.193:59385 dest: /192.168.0.175:50010 2009-06-02 12:15:06,589 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.193:59385, dest: /192.168.0.175:50010, bytes: 11819362, op: HDFS_WRITE, cliID: DFSClient_632050003, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_3906931575379674856_2506 2009-06-02 12:15:06,590 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_3906931575379674856_2506 terminating 2009-06-02 12:15:06,628 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_1104353063740638283_2506 src: /192.168.0.193:59387 dest: /192.168.0.175:50010 2009-06-02 12:15:06,633 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.193:59387, dest: /192.168.0.175:50010, bytes: 27402, op: HDFS_WRITE, cliID: DFSClient_632050003, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_1104353063740638283_2506 2009-06-02 12:15:06,633 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_1104353063740638283_2506 terminating 2009-06-02 12:15:07,191 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-5517132013528737139_2519 src: /192.168.0.193:59403 dest: /192.168.0.175:50010 2009-06-02 12:15:07,195 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.193:59403, dest: /192.168.0.175:50010, bytes: 41, op: HDFS_WRITE, cliID: DFSClient_632050003, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_-5517132013528737139_2519 2009-06-02 12:15:07,195 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-5517132013528737139_2519 terminating 2009-06-02 12:15:13,744 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-6636143128531703017_2450 file /data/hadoop-datastore/current/blk_-6636143128531703017 2009-06-02 12:15:13,758 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-3632713526855093302_2498 file /data/hadoop-datastore/current/blk_-3632713526855093302 2009-06-02 12:15:13,759 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-1486458067537042098_2495 file /data/hadoop-datastore/current/blk_-1486458067537042098 2009-06-02 12:16:55,410 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-2882612487393794888_2540 src: /192.168.0.193:59521 dest: /192.168.0.175:50010 2009-06-02 12:16:55,415 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.193:59521, dest: /192.168.0.175:50010, bytes: 110, op: HDFS_WRITE, cliID: DFSClient_632050003, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_-2882612487393794888_2540 2009-06-02 12:16:55,415 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-2882612487393794888_2540 terminating 2009-06-02 12:17:01,895 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-5517132013528737139_2519 file /data/hadoop-datastore/current/blk_-5517132013528737139 2009-06-02 12:17:56,840 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-5045036725303448421_2546 src: /192.168.0.193:39258 dest: /192.168.0.175:50010 2009-06-02 12:17:56,845 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.193:39258, dest: /192.168.0.175:50010, bytes: 117, op: HDFS_WRITE, cliID: DFSClient_632050003, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_-5045036725303448421_2546 2009-06-02 12:17:56,845 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-5045036725303448421_2546 terminating 2009-06-02 12:18:32,037 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 32 blocks got processed in 21 msecs 2009-06-02 12:19:52,477 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.194:41274, dest: /192.168.0.175:50010, bytes: 12792946, op: HDFS_WRITE, cliID: DFSClient_1389518311, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_8099366148547328126_2340 2009-06-02 12:19:52,478 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_8099366148547328126_2340 terminating 2009-06-02 12:19:52,730 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.175:50010, dest: /192.168.0.181:38194, bytes: 10978708, op: HDFS_READ, cliID: DFSClient_400382585, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_2475895093255279384_2500 2009-06-02 12:19:58,051 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-2411001974305504436_2549 src: /192.168.0.193:39260 dest: /192.168.0.175:50010 2009-06-02 12:19:58,053 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-8510344733259320018_2549 src: /192.168.0.191:37451 dest: /192.168.0.175:50010 2009-06-02 12:19:58,053 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-8510344733259320018_2549 src: /192.168.0.191:37451 dest: /192.168.0.175:50010 2009-06-02 12:20:02,142 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-5045036725303448421_2546 file /data/hadoop-datastore/current/blk_-5045036725303448421 2009-06-02 12:20:02,150 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_2475895093255279384_2500 file /data/hadoop-datastore/current/blk_2475895093255279384 2009-06-02 12:20:33,895 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls recoverBlock(block=blk_-2411001974305504436_2549, targets=[192.168.0.191:50010, 192.168.0.180:50010, 192.168.0.194:50010, 192.168.0.179:50010, 192.168.0.193:50010, 192.168.0.175:50010]) 2009-06-02 12:20:34,574 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls recoverBlock(block=blk_-8510344733259320018_2549, targets=[192.168.0.194:50010, 192.168.0.196:50010, 192.168.0.180:50010, 192.168.0.179:50010, 192.168.0.191:50010, 192.168.0.175:50010]) 2009-06-02 12:20:47,130 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-8510344733259320018_2549 java.net.SocketException: Connection reset 2009-06-02 12:20:47,131 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-8510344733259320018_2549 Interrupted. 2009-06-02 12:20:47,131 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-8510344733259320018_2549 terminating 2009-06-02 12:20:47,131 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-8510344733259320018_2549 received exception java.net.SocketException: Connection reset 2009-06-02 12:20:47,131 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.0.175:50010, storageID=DS-661194258-192.168.0.175-50010-1243890204046, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:261) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:619) 2009-06-02 12:20:47,138 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to updateBlock (newblock=blk_-8510344733259320018_2551, datanode=192.168.0.179:50010) org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_-8510344733259320018_2551 length is 11177984 does not match block file length 11899392 at org.apache.hadoop.hdfs.server.datanode.FSDataset.validateBlockMetadata(FSDataset.java:1259) at org.apache.hadoop.hdfs.server.datanode.FSDataset.tryUpdateBlock(FSDataset.java:898) at org.apache.hadoop.hdfs.server.datanode.FSDataset.updateBlock(FSDataset.java:810) at org.apache.hadoop.hdfs.server.datanode.DataNode.updateBlock(DataNode.java:1381) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) at org.apache.hadoop.ipc.Client.call(Client.java:697) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy6.updateBlock(Unknown Source) at org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1510) at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1479) at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1545) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) 2009-06-02 12:20:47,142 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: oldblock=blk_-8510344733259320018_2549(length=11184128), newblock=blk_-8510344733259320018_2551(length=11177984), datanode=192.168.0.175:50010 2009-06-02 12:20:47,193 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-8510344733259320018_2551 src: /192.168.0.191:37454 dest: /192.168.0.175:50010 2009-06-02 12:20:47,193 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block for append blk_-8510344733259320018_2551 2009-06-02 12:20:47,237 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Changing block file offset of block blk_-8510344733259320018_2551 from 0 to 11119104 meta file offset to 86875 2009-06-02 12:20:50,326 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.191:37454, dest: /192.168.0.175:50010, bytes: 67108864, op: HDFS_WRITE, cliID: DFSClient_400382585, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_-8510344733259320018_2551 2009-06-02 12:20:50,534 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-8510344733259320018_2551 terminating 2009-06-02 12:20:50,724 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-5221330074279244492_2551 src: /192.168.0.179:37758 dest: /192.168.0.175:50010 2009-06-02 12:22:15,009 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-3886449064876654030_2199 2009-06-02 12:24:01,494 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-2411001974305504436_2549 java.net.SocketException: Connection reset 2009-06-02 12:24:01,495 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-2411001974305504436_2549 Interrupted. 2009-06-02 12:24:01,495 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-2411001974305504436_2549 terminating 2009-06-02 12:24:01,495 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-2411001974305504436_2549 received exception java.net.SocketException: Connection reset 2009-06-02 12:24:01,495 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.0.175:50010, storageID=DS-661194258-192.168.0.175-50010-1243890204046, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:261) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:351) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:619) 2009-06-02 12:24:01,501 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to updateBlock (newblock=blk_-2411001974305504436_2550, datanode=192.168.0.179:50010) org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block blk_-2411001974305504436_2550 length is 3174400 does not match block file length 3511296 at org.apache.hadoop.hdfs.server.datanode.FSDataset.validateBlockMetadata(FSDataset.java:1259) at org.apache.hadoop.hdfs.server.datanode.FSDataset.tryUpdateBlock(FSDataset.java:898) at org.apache.hadoop.hdfs.server.datanode.FSDataset.updateBlock(FSDataset.java:810) at org.apache.hadoop.hdfs.server.datanode.DataNode.updateBlock(DataNode.java:1381) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) at org.apache.hadoop.ipc.Client.call(Client.java:697) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.ipc.Client.call(Client.java:697) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy6.updateBlock(Unknown Source) at org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1510) at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1479) at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1545) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) 2009-06-02 12:24:01,505 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: oldblock=blk_-2411001974305504436_2549(length=3186176), newblock=blk_-2411001974305504436_2550(length=3174400), datanode=192.168.0.175:50010 2009-06-02 12:24:01,596 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-2411001974305504436_2550 src: /192.168.0.193:53232 dest: /192.168.0.175:50010 2009-06-02 12:24:01,596 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block for append blk_-2411001974305504436_2550 2009-06-02 12:24:01,615 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Changing block file offset of block blk_-2411001974305504436_2550 from 0 to 3121152 meta file offset to 24391 2009-06-02 12:24:11,234 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.193:53232, dest: /192.168.0.175:50010, bytes: 18848146, op: HDFS_WRITE, cliID: DFSClient_400382585, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_-2411001974305504436_2550 2009-06-02 12:24:11,234 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-2411001974305504436_2550 terminating 2009-06-02 12:24:11,502 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.179:37758, dest: /192.168.0.175:50010, bytes: 398267, op: HDFS_WRITE, cliID: DFSClient_400382585, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_-5221330074279244492_2551 2009-06-02 12:24:11,503 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-5221330074279244492_2551 terminating 2009-06-02 12:24:14,656 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.175:50010, dest: /192.168.0.191:57293, bytes: 27618, op: HDFS_READ, cliID: DFSClient_679571661, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_1104353063740638283_2506 2009-06-02 12:24:15,307 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_6459724563472879206_2553 src: /192.168.0.191:57304 dest: /192.168.0.175:50010 2009-06-02 12:24:22,967 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-4146349943716507582_2556 src: /192.168.0.191:57319 dest: /192.168.0.175:50010 2009-06-02 12:24:22,971 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.191:57319, dest: /192.168.0.175:50010, bytes: 9, op: HDFS_WRITE, cliID: DFSClient_679571661, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_-4146349943716507582_2556 2009-06-02 12:24:22,971 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-4146349943716507582_2556 terminating 2009-06-02 12:24:24,141 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.191:57304, dest: /192.168.0.175:50010, bytes: 67108864, op: HDFS_WRITE, cliID: DFSClient_679571661, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_6459724563472879206_2553 2009-06-02 12:24:24,141 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_6459724563472879206_2553 terminating 2009-06-02 12:24:30,999 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.0.175:50010, dest: /192.168.0.191:57292, bytes: 148608, op: HDFS_READ, cliID: DFSClient_679571661, srvID: DS-661194258-192.168.0.175-50010-1243890204046, blockid: blk_3906931575379674856_2506 2009-06-02 12:24:31,415 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-4614683533809303164_2559 src: /192.168.0.191:57345 dest: /192.168.0.175:50010 2009-06-02 12:24:38,554 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.0.175:50010, storageID=DS-661194258-192.168.0.175-50010-1243890204046, infoPort=50075, ipcPort=50020) Starting thread to transfer block blk_-2411001974305504436_2550 to 192.168.0.179:50010, 192.168.0.196:50010 2009-06-02 12:24:38,560 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.0.175:50010, storageID=DS-661194258-192.168.0.175-50010-1243890204046, infoPort=50075, ipcPort=50020):Failed to transfer blk_-2411001974305504436_2550 to 192.168.0.179:50010 got java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:319) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1108) at java.lang.Thread.run(Thread.java:619) 2009-06-02 12:24:47,562 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-8510344733259320018_2551 file /data/hadoop-datastore/current/blk_-8510344733259320018 2009-06-02 12:24:47,562 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-8510344733259320018_2551 file /data/hadoop-datastore/current/blk_-8510344733259320018 2009-06-02 12:24:47,562 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-5221330074279244492_2551 file /data/hadoop-datastore/current/blk_-5221330074279244492 2009-06-02 12:24:51,350 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls recoverBlock(block=blk_-4614683533809303164_2559, targets=[192.168.0.175:50010]) 2009-06-02 12:24:51,353 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: oldblock=blk_-4614683533809303164_2559(length=65804288), newblock=blk_-4614683533809303164_2560(length=65804288), datanode=192.168.0.175:50010 2009-06-02 12:25:04,002 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-4614683533809303164_2559 java.nio.channels.ClosedByInterruptException 2009-06-02 12:25:04,003 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-4614683533809303164_2559 Interrupted. 2009-06-02 12:25:04,003 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-4614683533809303164_2559 terminating 2009-06-02 12:25:04,003 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-4614683533809303164_2559 received exception java.nio.channels.ClosedByInterruptException 2009-06-02 12:25:04,003 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.0.175:50010, storageID=DS-661194258-192.168.0.175-50010-1243890204046, infoPort=50075, ipcPort=50020):DataXceiver java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184) at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:271) at org.apache.hadoop.hdfs.server.datanode.FSDataset.getChannelPosition(FSDataset.java:1090) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.setBlockPosition(BlockReceiver.java:604) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:396) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:619) 2009-06-02 12:25:04,037 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-4614683533809303164_2560 src: /192.168.0.191:57348 dest: /192.168.0.175:50010 2009-06-02 12:25:04,037 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block for append blk_-4614683533809303164_2560 2009-06-02 12:25:22,924 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Changing block file offset of block blk_-4614683533809303164_2560 from 0 to 65739264 meta file offset to 513595 2009-06-02 12:25:22,928 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-4614683533809303164_2560 java.net.SocketException: Connection reset 2009-06-02 12:25:22,928 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-4614683533809303164_2560 Interrupted. 2009-06-02 12:25:22,928 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-4614683533809303164_2560 terminating 2009-06-02 12:25:22,928 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-4614683533809303164_2560 received exception java.net.SocketException: Connection reset 2009-06-02 12:25:22,928 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.0.175:50010, storageID=DS-661194258-192.168.0.175-50010-1243890204046, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:261) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:351) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:619) 2009-06-02 12:32:20,289 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-4146349943716507582_2556 2009-06-02 12:35:42,249 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_6405527929673442526_2329 2009-06-02 12:41:01,502 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_8099366148547328126_2340 ===================================================== Andrew Purtell-2 wrote: > > Thanks for the system info. CPU and RAM resources look good. > > You are not able to insert enough data into HBase to trigger splits (or > they are delayed due to intense write activity), because the HDFS layer > underneath is not able to keep up. If you have already tried the HDFS > related suggestions on the HBase troubleshooting page > (http://wiki.apache.org/hadoop/Hbase/Troubleshooting) and are still having > problems, adding additional HDFS data nodes to spread the load may help, > but it depends where is the bottleneck. What network do you have > connecting these boxes? Gigabit Ethernet? Fast (megabit) Ethernet? > Andrew Purtell-2 wrote: > > Something else to try here is forcing splits early in the upload process. > You can use the hbase shell or the HBaseAdmin client API to signal the > master to ask the regionserver(s) hosting a table to split it regardless > of whether the stores have reached the split threshold or not. This type > of split request is advisory, and will not happen if there is no data in a > region or if it has just split already and a compaction is pending. The > strategy here is to actually use all of the nodes you have deployed, or, > if you add more, to use those as well. Force enough splits so there is at > least one region of the table hosted by a region server on every node. It > could be something like: > > HTable table = ...; > HBaseAdmin admin = ...; > int nodes = getAvailableNodes(); > while (table.getStartKeys().length < nodes) { > // upload 100 rows > // ... > admin.split(table.getTableName()); > // give some time for things to settle down > Thread.sleep(30 * 1000); // or 60 * 1000 > } > // upload the rest > // ... > > There is no HBase API to stand in for getAvailableNodes() -- yet... I will > make one now, that seems useful -- but if you have co-deployed mapreduce > with your region servers, you could use JobClient.getClusterStatus() to > programmatically determine the size of the cluster. See > http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobClient.html#getClusterStatus() > > I would like to try this, but I'm not running map/reduce yet. -- View this message in context: http://www.nabble.com/HBase-looses-regions.-tp23657983p23839676.html Sent from the HBase User mailing list archive at Nabble.com.
