Hello, I have been testing Hbase for several weeks. My test cluster is made of 6 low cost machines (dell studio hybrid, core 2 duo 2Ghz, 4Go, HDD 320 Go).
My configurations files : hadoop-site.xml : <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop-tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/hadoop-dfs/data</value> <description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. </description> </property> <property> <name>fs.default.name</name> <value>hdfs://hephaistos:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> <property> <name>mapred.job.tracker</name> <value>hephaistos:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> <property> <name>dfs.replication</name> <value>2</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> <property> <name>dfs.block.size</name> <value>8388608</value> <description>The hbase standard size for new files.</description> <!--<value>67108864</value>--> <!--<description>The default block size for new files.</description>--> </property> <property> <name>dfs.datanode.max.xcievers</name> <value>8192</value> <description>Up xcievers (see HADOOP-3831)</description> </property> <property> <name>dfs.balance.bandwidthPerSec</name> <value>10485760</value> <description> Specifies the maximum bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second. Default is 1048576</description> </property> <property> <name>mapred.local.dir</name> <value>/home/hadoop/hadoop-mapred/local</value> <description>The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. </description> </property> <property> <name>mapred.system.dir</name> <value>home/hadoop/hadoop-mapred/system</value> <description>The shared directory where MapReduce stores control files. </description> </property> <property> <name>mapred.temp.dir</name> <value>home/hadoop/hadoop-mapred/temp</value> <description>A shared directory for temporary files. </description> </property> <property> <name>mapred.map.tasks</name> <value>20</value> <description>The default number of map tasks per job. Typically set to a prime several times greater than number of available hosts. Ignored when mapred.job.tracker is "local". </description> </property> <property> <name>mapred.reduce.tasks</name> <value>5</value> <description>The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". </description> </property> </configuration> hbase-site.xml: <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- /** * Copyright 2007 The Apache Software Foundation * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ --> <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://hephaistos:54310/hbase</value> <description>The directory shared by region servers. </description> </property> <property> <name>hbase.master</name> <value>hephaistos:60000</value> <description>The host and port that the HBase master runs at. </description> </property> <property> <name>hbase.hregion.memcache.flush.size</name> <value>67108864</value> <description> A HRegion memcache will be flushed to disk if size of the memcache exceeds this number of bytes. Value is checked by a thread that runs every hbase.server.thread.wakefrequency. </description> </property> <property> <name>hbase.hregion.max.filesize</name> <value>268435456</value> <description> Maximum HStoreFile size. If any one of a column families' HStoreFiles has grown to exceed this value, the hosting HRegion is split in two. Default: 256M. </description> </property> <property> <name>hbase.io.index.interval</name> <value>128</value> <description>The interval at which we record offsets in hbase store files/mapfiles. Default for stock mapfiles is 128. Index files are read into memory. If there are many of them, could prove a burden. If so play with the hadoop io.map.index.skip property and skip every nth index member when reading back the index into memory. Downside to high index interval is lowered access times. </description> </property> <property> <name>hbase.hstore.blockCache.blockSize</name> <value>65536</value> <description>The size of each block in the block cache. Enable blockcaching on a per column family basis; see the BLOCKCACHE setting in HColumnDescriptor. Blocks are kept in a java Soft Reference cache so are let go when high pressure on memory. Block caching is not enabled by default. Default is 16384. </description> </property> <property> <name>hbase.regionserver.lease.period</name> <value>240000</value> <description>HRegion server lease period in milliseconds. Default is 60 seconds. Clients must report in within this period else they are considered dead.</description> </property> </configuration> My main application of hbase is to build access indexes to a web archive. My test archive contains 160.10e6 objects that I insert in an hbase instance. Each rows contains about a thousand of bytes. During these bacth insertions I can see some exceptions related to DataXceiver : Case 1: On HBase Regionserver: 2009-02-27 04:23:52,185 INFO org.apache.hadoop.hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not replicated yet:/hbase/metadata_table/compaction.dir/1476318467/content/mapfiles/260278331337921598/data at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1256) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) at org.apache.hadoop.ipc.Client.call(Client.java:696) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy1.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) On Hadoop Datanode: 2009-02-27 04:22:58,110 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.1.188.249:50010, storageID=DS-1180278657-127.0.0.1-50010-1235652659245, infoPort=50075, ipcPort=50020):Got exception while serving blk_5465578316105624003_26301 to /10.1.188.249: java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.1.188.249:50010 remote=/10.1.188.249:48326] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94) at java.lang.Thread.run(Thread.java:619) 2009-02-27 04:22:58,110 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.1.188.249:50010, storageID=DS-1180278657-127.0.0.1-50010-1235652659245, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.1.188.249:50010 remote=/10.1.188.249:48326] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94) at java.lang.Thread.run(Thread.java:619) Case 2: HBase Regionserver: 2009-03-02 09:55:11,929 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-6496095407839777264_96895java.io.IOException: Bad response 1 for block blk_-6496095407839777264_96895 from datanode 10.1.188.182:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2342) 2009-03-02 09:55:11,930 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-6496095407839777264_96895 bad datanode[1] 10.1.188.182:50010 2009-03-02 09:55:11,930 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-6496095407839777264_96895 in pipeline 10.1.188.249:50010, 10.1.188.182:50010, 10.1.188.203:50010: bad datanode 10.1.188.182:50010 2009-03-02 09:55:14,362 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-7585241287138805906_96914java.io.IOException: Bad response 1 for block blk_-7585241287138805906_96914 from datanode 10.1.188.182:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2342) 2009-03-02 09:55:14,362 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-7585241287138805906_96914 bad datanode[1] 10.1.188.182:50010 2009-03-02 09:55:14,363 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-7585241287138805906_96914 in pipeline 10.1.188.249:50010, 10.1.188.182:50010, 10.1.188.141:50010: bad datanode 10.1.188.182:50010 2009-03-02 09:55:14,445 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_8693483996243654850_96912java.io.IOException: Bad response 1 for block blk_8693483996243654850_96912 from datanode 10.1.188.182:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2342) 2009-03-02 09:55:14,446 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_8693483996243654850_96912 bad datanode[1] 10.1.188.182:50010 2009-03-02 09:55:14,446 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_8693483996243654850_96912 in pipeline 10.1.188.249:50010, 10.1.188.182:50010, 10.1.188.203:50010: bad datanode 10.1.188.182:50010 2009-03-02 09:55:14,923 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-8939308025013258259_96931java.io.IOException: Bad response 1 for block blk_-8939308025013258259_96931 from datanode 10.1.188.182:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2342) 2009-03-02 09:55:14,935 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-8939308025013258259_96931 bad datanode[1] 10.1.188.182:50010 2009-03-02 09:55:14,935 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-8939308025013258259_96931 in pipeline 10.1.188.249:50010, 10.1.188.182:50010, 10.1.188.203:50010: bad datanode 10.1.188.182:50010 2009-03-02 09:55:15,344 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_7417692418733608681_96934java.io.IOException: Bad response 1 for block blk_7417692418733608681_96934 from datanode 10.1.188.182:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2342) 2009-03-02 09:55:15,344 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_7417692418733608681_96934 bad datanode[2] 10.1.188.182:50010 2009-03-02 09:55:15,344 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_7417692418733608681_96934 in pipeline 10.1.188.249:50010, 10.1.188.203:50010, 10.1.188.182:50010: bad datanode 10.1.188.182:50010 2009-03-02 09:55:15,579 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_6777180223564108728_96939java.io.IOException: Bad response 1 for block blk_6777180223564108728_96939 from datanode 10.1.188.182:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2342) 2009-03-02 09:55:15,579 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_6777180223564108728_96939 bad datanode[1] 10.1.188.182:50010 2009-03-02 09:55:15,579 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_6777180223564108728_96939 in pipeline 10.1.188.249:50010, 10.1.188.182:50010, 10.1.188.203:50010: bad datanode 10.1.188.182:50010 2009-03-02 09:55:15,930 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-6352908575431276531_96948java.io.IOException: Bad response 1 for block blk_-6352908575431276531_96948 from datanode 10.1.188.182:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2342) 2009-03-02 09:55:15,930 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-6352908575431276531_96948 bad datanode[2] 10.1.188.182:50010 2009-03-02 09:55:15,930 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-6352908575431276531_96948 in pipeline 10.1.188.249:50010, 10.1.188.30:50010, 10.1.188.182:50010: bad datanode 10.1.188.182:50010 2009-03-02 09:55:15,988 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_SPLIT: metadata_table,r:http://com.over-blog.www/_cdata/img/footer_mid....@20070505132942-20070505132942,1235761772185 2009-03-02 09:55:16,008 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-1071965721931053111_96956java.io.IOException: Bad response 1 for block blk_-1071965721931053111_96956 from datanode 10.1.188.182:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2342) 2009-03-02 09:55:16,008 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-1071965721931053111_96956 bad datanode[2] 10.1.188.182:50010 2009-03-02 09:55:16,009 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-1071965721931053111_96956 in pipeline 10.1.188.249:50010, 10.1.188.203:50010, 10.1.188.182:50010: bad datanode 10.1.188.182:50010 2009-03-02 09:55:16,073 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_1004039574836775403_96959java.io.IOException: Bad response 1 for block blk_1004039574836775403_96959 from datanode 10.1.188.182:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2342) 2009-03-02 09:55:16,073 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_1004039574836775403_96959 bad datanode[1] 10.1.188.182:50010 Hadoop datanode: 2009-03-02 09:55:10,201 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_-5472632607337755080_96875 1 Exception java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:833) at java.lang.Thread.run(Thread.java:619) 2009-03-02 09:55:10,407 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block blk_-5472632607337755080_96875 terminating 2009-03-02 09:55:10,516 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.1.188.249:50010, storageID=DS-1180278657-127.0.0.1-50010-1235652659245, infoPort=50075, ipcPort=50020):Exception writing block blk_-5472632607337755080_96875 to mirror 10.1.188.182:50010 java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:391) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102) at java.lang.Thread.run(Thread.java:619) 2009-03-02 09:55:10,517 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_-5472632607337755080_96875 java.io.IOException: Broken pipe 2009-03-02 09:55:10,517 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-5472632607337755080_96875 received exception java.io.IOException: Broken pipe 2009-03-02 09:55:10,517 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.1.188.249:50010, storageID=DS-1180278657-127.0.0.1-50010-1235652659245, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:391) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102) at java.lang.Thread.run(Thread.java:619) 2009-03-02 09:55:11,174 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.1.188.249:49063, dest: /10.1.188.249:50010, bytes: 312, op: HDFS_WRITE, cliID: DFSClient_1091437257, srvID: DS-1180278657-127.0.0.1-50010-1235652659245, blockid: blk_5027345212081735473_96878 2009-03-02 09:55:11,177 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block blk_5027345212081735473_96878 terminating 2009-03-02 09:55:11,185 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-3992843464553216223_96885 src: /10.1.188.249:49069 dest: /10.1.188.249:50010 2009-03-02 09:55:11,186 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-3132070329589136987_96885 src: /10.1.188.30:33316 dest: /10.1.188.249:50010 2009-03-02 09:55:11,187 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for block blk_8782629414415941143_96845 java.io.IOException: Connection reset by peer 2009-03-02 09:55:11,187 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_8782629414415941143_96845 Interrupted. 2009-03-02 09:55:11,187 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_8782629414415941143_96845 terminating 2009-03-02 09:55:11,187 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_8782629414415941143_96845 received exception java.io.IOException: Connection reset by peer 2009-03-02 09:55:11,187 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.1.188.249:50010, storageID=DS-1180278657-127.0.0.1-50010-1235652659245, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:251) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:298) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:362) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:514) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:356) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102) at java.lang.Thread.run(Thread.java:619) etc............................. I have others exceptions related to DataXceivers problems. These errors doesn't make the region server go down, but I can see that I lost some records (about 3.10e6 out of 160.10e6). As you can see in my conf files, I up the dfs.datanode.max.xcievers to 8192 as suggested from several mails. And my ulimit -n is at 32768. Do these problems come from my configuration, or my hardware ? Jérôme Thièvre
