Do you know how much data that accounts? Do you think it would make sense to enable compression on your family holding the HTML? If so, please read http://wiki.apache.org/hadoop/UsingLzoCompression this may help you a lot.
J-D On Thu, Aug 6, 2009 at 1:57 AM, Zheng Lv<[email protected]> wrote: > Hello, > I adjusted the option "zookeeper.session.timeout" to 120000, and then > restarted the hbase cluster and the test program. After running normally for > 14 > > hours, one of datanodes shut down. When I restarted the hadoop and hbase, > and checked the row count of table 'webpage', I got the result of 6625, > while the > > test program log telling me there should be at least 885000. There are too > many data lost. Following is the end part of the datanode log in that > server. > > 2009-08-06 04:28:32,214 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > 192.168.33.9:45465, dest: /192.168.33.6:50010, bytes: 1214, > > op: HDFS_WRITE, cliID: DFSClient_1777493426, srvID: > DS-1028185837-192.168.33.6-50010-1249268609430, blockid: > blk_-402434507207277902_27468 > 2009-08-06 04:28:32,214 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block > blk_-402434507207277902_27468 terminating > 2009-08-06 04:28:32,606 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > 192.168.33.6:50010, dest: /192.168.33.5:44924, bytes: 446, > > op: HDFS_READ, cliID: DFSClient_-255011821, srvID: > DS-1028185837-192.168.33.6-50010-1249268609430, blockid: > blk_-2647720945992878390_27447 > 2009-08-06 04:28:32,612 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > 192.168.33.6:50010, dest: /192.168.33.5:44925, bytes: 277022, > > op: HDFS_READ, cliID: DFSClient_-255011821, srvID: > DS-1028185837-192.168.33.6-50010-1249268609430, blockid: > blk_-2647720945992878390_27447 > 2009-08-06 04:28:32,770 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block > blk_-5186903983646527212_27469 src: /192.168.33.5:44941 dest: > > /192.168.33.6:50010 > 2009-08-06 04:29:35,672 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder > blk_1888582734643135148_27447 1 Exception > > java.net.SocketTimeoutException: 60000 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > > > local=/192.168.33.6:35418 remote=/192.168.33.5:50010] > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at java.io.DataInputStream.readLong(DataInputStream.java:399) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:853) > at java.lang.Thread.run(Thread.java:619) > > 2009-08-06 04:29:35,673 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for block > blk_1888582734643135148_27447 terminating > 2009-08-06 04:29:35,683 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock > for block blk_1888582734643135148_27447 > > java.io.EOFException: while trying to read 65557 bytes > 2009-08-06 04:29:35,689 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock > blk_1888582734643135148_27447 received exception > > java.io.EOFException: while trying to read 65557 bytes > 2009-08-06 04:29:35,689 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( > 192.168.33.6:50010, storageID=DS-1028185837-192.168.33.6 > > -50010-1249268609430, infoPort=50075, ipcPort=50020):DataXceiver > java.io.EOFException: while trying to read 65557 bytes > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) > at java.lang.Thread.run(Thread.java:619) > > > > > ************************************* > > > > > And following is part of the content of test program log. > > insertting 880000 webpages need 51920792 ms. > insertting 881000 webpages need 51972741 ms. > insertting 882000 webpages need 52024775 ms. > 09/08/06 04:32:20 WARN zookeeper.ClientCnxn: Exception closing session > 0x222e91bb6b90002 to sun.nio.ch.selectionkeyi...@527809c6 > java.io.IOException: TIMED OUT > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858) > 09/08/06 04:32:21 INFO zookeeper.ClientCnxn: Attempting connection to server > ubuntu3/192.168.33.8:2222 > 09/08/06 04:32:21 INFO zookeeper.ClientCnxn: Priming connection to > java.nio.channels.SocketChannel[connected local=/192.168.33.7:52496 > > remote=ubuntu3/192.168. > 33.8:2222] > 09/08/06 04:32:21 INFO zookeeper.ClientCnxn: Server connection successful > insertting 883000 webpages need 52246380 ms. > insertting 884000 webpages need 52298370 ms. > insertting 885000 webpages need 52380479 ms. > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact > region server Some server, retryOnlyOne=true, index=0, islastrow=true, > tries=9, > > nu > mtries=10, i=0, listsize=1, location=address: 192.168.33.5:60020, > regioninfo: REGION => {NAME => 'webpage,http:\x2F\x2Fnews.163.com > \x2F09\x2F0803\x2F01 > > \x2F5FO > O155J0001124J.html1249504151762_879696,1249504267420', STARTKEY => > 'http:\x2F\x2Fnews.163.com\x2F09\x2F0803\x2F01 > > \x2F5FOO155J0001124J.html1249504151762_879696 > ', ENDKEY => '', ENCODED => 1607113409, TABLE => {{NAME => 'webpage', > FAMILIES => [{NAME => 'CF_CONTENT', COMPRESSION => 'NONE', VERSIONS => '2', > TTL => > > '2147 > 483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, > {NAME => 'CF_INFORMATION', COMPRESSION => 'NONE', VERSIONS => '1', TTL => > > '2147483 > 647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}, > region=webpage,http:\x2F\x2Fnews.163.com\x2F09\x2F0803\x2F01 > > \x2F5FOO155J0001124J.h > tml1249504151762_879696,1249504267420 for region webpage,http:\x2F\ > x2Fnews.163.com\x2F09\x2F0803\x2F01 > > \x2F5FOO155J0001124J.html1249504151762_879696,1249504267 > 420, row > 'http:\x2F\x2Fnews.163.com\x2F09\x2F0803\x2F01\x2F5FOO155J0001124J.html1249504668723_885781', > but failed after 10 attempts. > Exceptions: > > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1041) > at > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:584) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:450) > at hbasetest.HBaseWebpage.insert(HBaseWebpage.java:82) > at hbasetest.InsertThread.run(InsertThread.java:26) > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact > region server Some server, retryOnlyOne=true, index=0, islastrow=true, > tries=9, > > nu > mtries=10, i=0, listsize=1, location=address: 192.168.33.5:60020, > regioninfo: REGION => {NAME => 'webpage,http:\x2F\x2Fnews.163.com > \x2F09\x2F0803\x2F01 > > \x2F5FO > O155J0001124J.html1249504151762_879696,1249504267420', STARTKEY => > 'http:\x2F\x2Fnews.163.com\x2F09\x2F0803\x2F01 > > \x2F5FOO155J0001124J.html1249504151762_879696 > ', ENDKEY => '', ENCODED => 1607113409, TABLE => {{NAME => 'webpage', > FAMILIES => [{NAME => 'CF_CONTENT', COMPRESSION => 'NONE', VERSIONS => '2', > TTL => > > '2147 > 483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, > {NAME => 'CF_INFORMATION', COMPRESSION => 'NONE', VERSIONS => '1', TTL => > > '2147483 > 647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}, > region=webpage,http:\x2F\x2Fnews.163.com\x2F09\x2F0803\x2F01 > > \x2F5FOO155J0001124J.h > tml1249504151762_879696,1249504267420 for region webpage,http:\x2F\ > x2Fnews.163.com\x2F09\x2F0803\x2F01 > > \x2F5FOO155J0001124J.html1249504151762_879696,1249504267 > 420, row > 'http:\x2F\x2Fnews.163.com\x2F09\x2F0803\x2F01\x2F5FOO155J0001124J.html1249504754735_885782', > but failed after 10 attempts. > Exceptions: > > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1041) > at > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:584) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:450) > at hbasetest.HBaseWebpage.insert(HBaseWebpage.java:82) > at hbasetest.InsertThread.run(InsertThread.java:26) > . > . > . > . > . > . > . > > > > Any suggestion? > Thanks a lot, > LvZheng > > 2009/8/5 Zheng Lv <[email protected]> > >> Hi Stack, >> Thank you very much for your explaination. >> We just adjusted the value of the property "zookeeper.session.timeout" >> to 120000, and we are observing the system now. >> "Are nodes running on same nodes as hbase? " --Do you mean we should >> have several servers running exclusively for zk cluster? But I'm afraid that >> we can not have that many servers. Any suggestion? >> We don't config the zk in zoo.cfg, but in hbase-site.xml. Following is >> the content in hbase-site.xml about zk. >> <property> >> <name>hbase.zookeeper.property.clientPort</name> >> <value>2222</value> >> </property> >> >> <property> >> <name>hbase.zookeeper.quorum</name> >> <value>ubuntu2,ubuntu3,ubuntu7,ubuntu9,ubuntu6</value> >> </property> >> >> <property> >> <name>zookeeper.session.timeout</name> >> <value>120000</value> >> </property> >> >> Thanks a lot, >> LvZheng >> >> >
