2010/2/4 Michał Podsiadłowski <[email protected]>: > Hi all, > I wrote yesterday evening (of my time :)) about missing file and today i did > a restart of whole hbase and it looks like problem disappeared. According to > my taste it looks like either client or region server "forgets" about table > split and still tries to retrieve data from old region while there are > already 2 new daughters. My misconfiguration or some bug - maybe some > threading issue. This is not the first time I've seen this. 2 days ago I've > ended up testing on the same error but then I thought that it was due to > datanodes having problems with persisting files due to disks out of space. > This time there was plenty of space on all nodes.
Next time, when you see something like this: >> java.io.IOException: java.io.IOException: Cannot open filename >> /hbase/filmContributors/1670715971/content/3783592739034234831 ...try getting it with a new client as in: $ ./bin/hadoop fs -get /hbase/filmContributors/1670715971/content/3783592739034234831 . Does it work? If so, then the dfsclient hbase is using has 'rotted'. This is usually one or a combination of ulimit not > default or xceivers not > default or you (unlikely) do not have a patched hadoop in your hbase CLASSPATH (hbase needs hdfs-127... the hadoop that is in the hbase/lib dir has a hadoop with this patch applied). The below is really bad usually indicative of a stressed hdfs (or one not configered for the load its taking on): > IOException: Could not complete write to file I tried to follow your pastebin link but it is empty for me. It works for you? St.Ack > > Thanks, > Michal > > W dniu 3 lutego 2010 17:14 użytkownik Michał Podsiadłowski < > [email protected]> napisał: > >> Hi, >> it's me again having problem - hope this is not another misconfiguration >> problem ( or maybe it would be better it it was one). >> After loading some moderate amount of data - around 3GB some rows are not >> available due to strange exceptions >> >> java.io.IOException: java.io.IOException: Cannot open filename >> /hbase/filmContributors/1670715971/content/3783592739034234831 >> >> When trying to scan the table regions server pukes like this >> >> 2010-02-03 16:03:39,060 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server >> handler 7 on 60020, call next(2813423168169765496, 1) from >> 10.0.100.50:41364: error: java.io.IOException: java.lang.RuntimeException: >> java.io.IOException: Cannot open filename >> /hbase/filmContributors/1670715971/content/3783592739034234831 >> java.io.IOException: java.lang.RuntimeException: java.io.IOException: >> Cannot open filename >> /hbase/filmContributors/1670715971/content/3783592739034234831 >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:872) >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:862) >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1918) >> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> Caused by: java.lang.RuntimeException: java.io.IOException: Cannot open >> filename /hbase/filmContributors/1670715971/content/3783592739034234831 >> at >> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:61) >> at >> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:79) >> at >> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:164) >> at >> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:106) >> at >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1807) >> at >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1771) >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1894) >> ... 5 more >> Caused by: java.io.IOException: Cannot open filename >> /hbase/filmContributors/1670715971/content/3783592739034234831 >> at >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1474) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1800) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1616) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743) >> at java.io.DataInputStream.read(DataInputStream.java:132) >> at >> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:99) >> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100) >> at >> org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1020) >> at >> org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:971) >> at >> org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.next(HFile.java:1163) >> at >> org.apache.hadoop.hbase.io.HalfHFileReader$1.next(HalfHFileReader.java:125) >> at >> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:58) >> ... 11 more >> >> >> greping regionserver log for dir name *1670715971* shows this >> 2010-02-03 15:32:37,082 DEBUG org.apache.hadoop.hbase.regionserver.Store: >> loaded >> /hbase/filmContributors/314440477/content/3783592739034234831.1670715971, >> isReference=true, sequence id=7541774, length=33390929, >> majorCompaction=false >> 2010-02-03 15:32:37,088 DEBUG org.apache.hadoop.hbase.regionserver.Store: >> loaded >> /hbase/filmContributors/314440477/content/6518523095287027530.1670715971, >> isReference=true, sequence id=7542003, length=7890, majorCompaction=false >> 2010-02-03 15:32:37,095 DEBUG org.apache.hadoop.hbase.regionserver.Store: >> loaded >> /hbase/filmContributors/314440477/description/2305635563712489918.1670715971, >> isReference=true, sequence id=7542003, length=2256, majorCompaction=false >> 2010-02-03 15:32:37,101 DEBUG org.apache.hadoop.hbase.regionserver.Store: >> loaded >> /hbase/filmContributors/314440477/description/6970032752270852156.1670715971, >> isReference=true, sequence id=7541774, length=6664268, majorCompaction=false >> 2010-02-03 15:32:37,129 DEBUG org.apache.hadoop.hbase.regionserver.Store: >> loaded >> /hbase/filmContributors/1836766931/content/3783592739034234831.1670715971, >> isReference=true, sequence id=7541773, length=33390929, >> majorCompaction=false >> 2010-02-03 15:32:37,152 DEBUG org.apache.hadoop.hbase.regionserver.Store: >> loaded >> /hbase/filmContributors/1836766931/content/6518523095287027530.1670715971, >> isReference=true, sequence id=7542002, length=7890, majorCompaction=false >> 2010-02-03 15:32:37,165 DEBUG org.apache.hadoop.hbase.regionserver.Store: >> loaded >> /hbase/filmContributors/1836766931/description/2305635563712489918.1670715971, >> isReference=true, sequence id=7542002, length=2256, majorCompaction=false >> 2010-02-03 15:32:37,170 DEBUG org.apache.hadoop.hbase.regionserver.Store: >> loaded >> /hbase/filmContributors/1836766931/description/6970032752270852156.1670715971, >> isReference=true, sequence id=7541773, length=6664268, majorCompaction=false >> 2010-02-03 15:33:49,943 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read: >> java.io.IOException: Cannot open filename >> /hbase/filmContributors/1670715971/content/3783592739034234831 >> and many many times java.io.IOException: Cannot open filename >> /hbase/filmContributors/*1670715971*/content/3783592739034234831 >> >> *on different one I found this * >> >> 2010-02-03 15:32:35,512 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: >> Cleaned up /hbase/filmContributors/*1670715971*/splits true >> 2010-02-03 15:32:35,515 INFO >> org.apache.hadoop.hbase.regionserver.CompactSplitThread: region split, META >> updated, and report to master all successful. Old region=REGION => {NAME => >> 'filmContributor >> s,,1265203126633', STARTKEY => '', ENDKEY => '31587', ENCODED => * >> 1670715971*, OFFLINE => true, SPLIT => true, TABLE => {{NAME => >> 'filmContributors', MAX_FILESIZE => '268435456', FAMILIES => [{NAME = >> > 'content', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', >> BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => >> 'description', COMPRESSION => 'NONE', VERSIONS >> => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', >> BLOCKCACHE => 'true'}]}}, new regions: filmContributors,,1265207555247, >> filmContributors,117416,1265207555247. Split took 0s >> ec >> ** >> more details here - http://pastebin.com/d7c52f27a >> >> Also sometimes namenode logs i can see messages like this: >> >> 2010-02-03 15:32:38,416 INFO org.apache.hadoop.ipc.Server: IPC Server >> handler 3 on 54310, call >> complete(/hbase/filmContributors/compaction.dir/1836766931/2633146516707160051, >> DFSClient_-902184734) >> from 10.0.100.51:49692: error: java.io.IOException: Could not complete >> write to file >> /hbase/filmContributors/compaction.dir/1836766931/2633146516707160051 by >> DFSClient_-902184734 >> java.io.IOException: Could not complete write to file >> /hbase/filmContributors/compaction.dir/1836766931/2633146516707160051 by >> DFSClient_-902184734 >> >> >> Please help. >> >> Cheers, >> Michal >> >
