Hi!

First of all, thank you guys for your hard work!

I am testing this new RC.

1. I see some unprintable characters in the logs. I don't think it was there before. I looked and found in org.apache.hadoop.hbase.master.ProcessRegionOpen in line 82, LOG.info() outputs "Bytes.toString(this.startCode)", which gives those non-printables.

Also, isn't it a bit misleading ProcessRegionOpen does not have it's own Log instance and therefore it appears as org.apache.hadoop.hbase.master.RegionServerOperation in the logs?

2. Now, a more serious thing. Everything worked fine for a day. For instance, it nicely survived a "kill -9" on a region with both ROOT and META. After the timeout the regions were reassigned and the cluster resumed normal operation. However, shortly after another regionserver was killed and the cluster rebounded I found my tables could not be read because, for some regions, HBase thinks the data is located in wrong files on DFS! Here is what the client says now when trying to get data from one of the tables:

09/06/05 12:56:58 ERROR hbase.CHBaseTest: I/O error occured.
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 192.168.37.26:60020 for region log_trace,,1243520531393, row '', but failed after 10 attempts.
Exceptions:
java.io.IOException: java.io.IOException: HStoreScanner failed construction
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:68) at org.apache.hadoop.hbase.regionserver.HStoreScanner.<init>(HStoreScanner.java:92)
        at 
org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2134)
at org.apache.hadoop.hbase.regionserver.HRegion$HScanner.<init>(HRegion.java:2000) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1187) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1714)
        at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912) Caused by: java.io.FileNotFoundException: File does not exist: hdfs://namenode:9000/hbase/log_trace/1009406857/f/mapfiles/7948177727169613520/data at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
        at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
at org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431) at org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426) at org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:327) at org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:95)
        at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:309)
at org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78) at org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68) at org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:127) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:65)
        ... 10 more

At this time /hbase/log_trace/1009406857/f/mapfiles contains "7416235210013649080" subdir with seemingly normal data, but no "7948177727169613520" subdir.

Also, I see endless errors on the regionservers from org.apache.hadoop.hbase.regionserver.CompactSplitThread complaining that this or that file does not exist.

It all seems to start about this moment:

2009-06-04 11:10:30,816 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region log_activity,20090527221746113test746698248,1243481739735 2009-06-04 11:10:46,517 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region log_activity,20090527221746113test746698248,1243481739735 in 15sec 2009-06-04 11:10:46,517 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region log_data_exchange,20090528220725927test2142083448,1243570162509 2009-06-04 11:11:12,821 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region log_data_exchange,20090528220725927test2142083448,1243570162509 in 26sec 2009-06-04 11:11:12,822 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region log_data_exchange-recipient_name,196732233120090528190946653test807696160,1243578229425 2009-06-04 11:11:33,899 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: ..META.,,1: safeMode=false 2009-06-04 11:11:33,899 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: .META.,,1: safeMode=false 2009-06-04 11:11:34,839 INFO org.apache.hadoop.hbase.regionserver.HRegion: region ..META.,,1/1028785192 available 2009-06-04 11:11:44,874 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region log_data_exchange-recipient_name,196732233120090528190946653test807696160,1243578229425 in 32sec 2009-06-04 11:11:44,875 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region log_data_exchange,20090529115336764test675370772,1243644061166 2009-06-04 11:11:44,881 WARN org.apache.hadoop.hbase.regionserver.HStore: Failed with java.io.FileNotFoundException: File does not exist: hdfs://namenode:9000/hbase/log_data_exchange/2095948567/f/mapfiles/554805660147274743/data: 2095948567/f/554805660147274743 2009-06-04 11:11:44,885 ERROR org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split failed for region log_data_exchange,20090529115336764test675370772,1243644061166 java.io.FileNotFoundException: File does not exist: hdfs://namenode:9000/hbase/log_data_exchange/2095948567/f/mapfiles/554805660147274743/data at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
        at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
at org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431) at org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426) at org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:327) at org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:95)
        at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:309)
at org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78) at org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68) at org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443) at org.apache.hadoop.hbase.regionserver.HStore.createReaders(HStore.java:1049)
        at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:922)
at org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:732) at org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:689) at org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:105) 2009-06-04 11:11:44,887 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region log_activity,20090527160652393test2101793466,1243459595085 2009-06-04 11:12:07,218 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region log_activity,20090527160652393test2101793466,1243459595085 in 22sec


--Kirill

Reply via email to