Thanks Andy! Doing this right now. I'll test and let you know the outcome.
Lucas On Thu, Jul 2, 2009 at 8:01 PM, Andrew Purtell <[email protected]> wrote: > Yes. This: > > storageID=DS-1037027782$ > java.io.FileNotFoundException: > > /usr/local/hadoop_data/hadoop-root/dfs/data/current/subdir13/blk_684021770465535076 > >>>>> (Too many open files) > > > means you need to increase the file descriptor limit for the user under > which you run your DataNode processes. For example, one common method is to > set 'nofile' limits in /etc/security/limits.conf to a larger multiple of 2, > perhaps 10240. Both hard and soft limits need to be set for the setting to > take effect. Mine is: > > hadoop soft nofile 10240 > hadoop hard nofile 10240 > > - Andy > > > > ________________________________ > From: Lucas Nazário dos Santos <[email protected]> > To: [email protected] > Sent: Thursday, July 2, 2009 10:25:57 AM > Subject: Re: HBase hangs > > Thanks for all Andy. > > I've had a look into other log files and found some strange messages. > Hadoop's datanode log produced the erros bellow by the time HBase became > unavailable. > > Does it help? > > I'll have a look at your suggestions and give them a shot. > > Thanks, > Lucas > > > > NODE 192.168.1.2 > > 2009-07-02 05:52:34,999 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( > 192.168.1.2:50010, > > storageID=DS-395520527-$ > java.net.SocketTimeoutException: 480000 millis timeout while waiting for > channel to be ready for write. ch : > > java.nio.channels.SocketChannel[$ > at > > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185) > at > > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) > at > > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) > at > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) > at java.lang.Thread.run(Thread.java:619) > > 2009-07-02 05:52:34,999 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( > 192.168.1.2:50010, > > storageID=DS-395520527$ > java.net.SocketTimeoutException: 480000 millis timeout while waiting for > channel to be ready for write. ch : > > java.nio.channels.SocketChannel[$ > at > > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185) > at > > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) > at > > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) > at > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) > at java.lang.Thread.run(Thread.java:619) > > > > NODE 192.168.1.3 > > 2009-07-02 04:27:06,643 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( > 192.168.1.3:50010, > > storageID=DS-1037027782$ > java.io.FileNotFoundException: > > /usr/local/hadoop_data/hadoop-root/dfs/data/current/subdir13/blk_684021770465535076 > (Too many open files) > at java.io.RandomAccessFile.open(Native Method) > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) > at > > org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockInputStream(FSDataset.java:738) > at > > org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:166) > at > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) > at java.lang.Thread.run(Thread.java:619) > > 2009-07-02 04:27:06,644 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( > 192.168.1.3:50010, > > storageID=DS-103702778$ > java.io.FileNotFoundException: > > /usr/local/hadoop_data/hadoop-root/dfs/data/current/subdir13/blk_684021770465535076 > (Too many open files) > at java.io.RandomAccessFile.open(Native Method) > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) > at > > org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockInputStream(FSDataset.java:738) > at > > org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:166) > at > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) > at java.lang.Thread.run(Thread.java:619) > > > > > > On Thu, Jul 2, 2009 at 1:06 PM, Andrew Purtell <[email protected]> > wrote: > > > Hi, > > > > Are there related exceptions in your DataNode logs? > > > > There are some HDFS related troubleshooting steps up on the wiki: > > http://wiki.apache.org/hadoop/Hbase/Troubleshooting > > > > Have you increased the number of file descriptors available to the Data > > Nodes? For example, one common method is to set 'nofile' limits in > > /etc/security/limits.conf to a larger multiple of 2, perhaps 10240. > > > > Have you added a setting of dfs.datanode.max.xcievers (in > hadoop-site.xml) > > to a larger value than the default (256)? For example, 1024 or 2048? > > > > - Andy > > > > > > > > On Thu, Jul 2, 2009 at 11:40 AM, Lucas Nazário dos Santos wrote: > > > > > Hi, > > > > > > It's the second time it happens. I have a Hadoop job that reads and > > inserts > > > data into HBase. It works perfectly for a couple of hours and then > HBase > > > hangs. > > > > > > I'm using HBase 0.19.3 and Hadoop 0.19.1. > > > > > > Interesting is that the list command shows the table, but a count > returns > > > an exception. > > > > > > Bellow is the error log. > > > > > > Does anybody know what is happening? > > > > > > Lucas > > > > > > > > > > > > 2009-07-02 05:38:02,337 INFO > > > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_FLUSH: > > > document,,1246496132379: safeMode=false > > > 2009-07-02 05:38:02,337 INFO > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > > > MSG_REGION_FLUSH: document,,1246496132379: safeMode=false > > > 2009-07-02 05:40:11,518 WARN org.apache.hadoop.hdfs.DFSClient: > Exception > > > while reading from blk_4097294633794140351_1008 of > > > /hbase/-ROOT-/70236052/info/mapfiles/6567566389528605238/index from > > > 192.168.1.3:50010: java.io.IOException: Premeture EOF from inputStream > > > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102) > > > at > > > > > > org.apache.hadoop.hdfs.DFSClient$BlockReader.readChunk(DFSClient.java:1207) > > > at > > > > > > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238) > > > at > org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177) > > > at > org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:194) > > > at > org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159) > > > at > > > org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1060) > > > at > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1615) > > > at > > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1665) > > > at java.io.DataInputStream.readFully(DataInputStream.java:178) > > > at java.io.DataInputStream.readFully(DataInputStream.java:152) > > > at > > > > > > org.apache.hadoop.hbase.io.SequenceFile$Reader.init(SequenceFile.java:1464) > > > at > > > > > > org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1442) > > > at > > > > > > org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431) > > > at > > > > > > org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426) > > > at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:318) > > > at > > > > > > org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78) > > > at > > > > > > org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:127) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:65) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.HStoreScanner.<init>(HStoreScanner.java:92) > > > at > > > > org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2134) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.HRegion$HScanner.<init>(HRegion.java:2000) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1187) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1714) > > > at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > > > at > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > at > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) > > > at > > > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912) > > > > > > 2009-07-02 05:41:11,517 WARN org.apache.hadoop.hdfs.DFSClient: > Exception > > > while reading from blk_4097294633794140351_1008 of > > > /hbase/-ROOT-/70236052/info/mapfiles/6567566389528605238/index from > > > 192.168.1.3:50010: java.io.IOException: Premeture EOF from inputStream > > > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102) > > > at > > > > > > org.apache.hadoop.hdfs.DFSClient$BlockReader.readChunk(DFSClient.java:1207) > > > at > > > > > > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238) > > > at > org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177) > > > at > org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:194) > > > at > org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159) > > > at > > > org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1060) > > > at > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1615) > > > at > > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1665) > > > at java.io.DataInputStream.readFully(DataInputStream.java:178) > > > at java.io.DataInputStream.readFully(DataInputStream.java:152) > > > at > > > > > > org.apache.hadoop.hbase.io.SequenceFile$Reader.init(SequenceFile.java:1464) > > > at > > > > > > org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1442) > > > at > > > > > > org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431) > > > at > > > > > > org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426) > > > at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:318) > > > at > > > > > > org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78) > > > at > > > > > > org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:127) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:65) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.HStoreScanner.<init>(HStoreScanner.java:92) > > > at > > > > org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2134) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.HRegion$HScanner.<init>(HRegion.java:2000) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1187) > > > at > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1714) > > > at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > > > at > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > at > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) > > > at > > > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912) > > > > > > > > > > > > > > > > > > > > >
