Hi!
First of all, thank you guys for your hard work!
I am testing this new RC.
1. I see some unprintable characters in the logs. I don't think it was
there before. I looked and found in
org.apache.hadoop.hbase.master.ProcessRegionOpen in line 82, LOG.info()
outputs "Bytes.toString(this.startCode)", which gives those non-printables.
Also, isn't it a bit misleading ProcessRegionOpen does not have it's own
Log instance and therefore it appears as
org.apache.hadoop.hbase.master.RegionServerOperation in the logs?
2. Now, a more serious thing. Everything worked fine for a day. For
instance, it nicely survived a "kill -9" on a region with both ROOT and
META. After the timeout the regions were reassigned and the cluster
resumed normal operation. However, shortly after another regionserver
was killed and the cluster rebounded I found my tables could not be read
because, for some regions, HBase thinks the data is located in wrong
files on DFS! Here is what the client says now when trying to get data
from one of the tables:
09/06/05 12:56:58 ERROR hbase.CHBaseTest: I/O error occured.
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
contact region server 192.168.37.26:60020 for region
log_trace,,1243520531393, row '', but failed after 10 attempts.
Exceptions:
java.io.IOException: java.io.IOException: HStoreScanner failed construction
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:68)
at
org.apache.hadoop.hbase.regionserver.HStoreScanner.<init>(HStoreScanner.java:92)
at
org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2134)
at
org.apache.hadoop.hbase.regionserver.HRegion$HScanner.<init>(HRegion.java:2000)
at
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1187)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1714)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
Caused by: java.io.FileNotFoundException: File does not exist:
hdfs://namenode:9000/hbase/log_trace/1009406857/f/mapfiles/7948177727169613520/data
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
at
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
at
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426)
at
org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:327)
at
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:95)
at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:309)
at
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78)
at
org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:127)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:65)
... 10 more
At this time /hbase/log_trace/1009406857/f/mapfiles contains
"7416235210013649080" subdir with seemingly normal data, but no
"7948177727169613520" subdir.
Also, I see endless errors on the regionservers from
org.apache.hadoop.hbase.regionserver.CompactSplitThread complaining that
this or that file does not exist.
It all seems to start about this moment:
2009-06-04 11:10:30,816 INFO
org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on
region log_activity,20090527221746113test746698248,1243481739735
2009-06-04 11:10:46,517 INFO
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on
region log_activity,20090527221746113test746698248,1243481739735 in 15sec
2009-06-04 11:10:46,517 INFO
org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on
region log_data_exchange,20090528220725927test2142083448,1243570162509
2009-06-04 11:11:12,821 INFO
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on
region log_data_exchange,20090528220725927test2142083448,1243570162509
in 26sec
2009-06-04 11:11:12,822 INFO
org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on
region
log_data_exchange-recipient_name,196732233120090528190946653test807696160,1243578229425
2009-06-04 11:11:33,899 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
..META.,,1: safeMode=false
2009-06-04 11:11:33,899 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
MSG_REGION_OPEN: .META.,,1: safeMode=false
2009-06-04 11:11:34,839 INFO
org.apache.hadoop.hbase.regionserver.HRegion: region
..META.,,1/1028785192 available
2009-06-04 11:11:44,874 INFO
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on
region
log_data_exchange-recipient_name,196732233120090528190946653test807696160,1243578229425
in 32sec
2009-06-04 11:11:44,875 INFO
org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on
region log_data_exchange,20090529115336764test675370772,1243644061166
2009-06-04 11:11:44,881 WARN
org.apache.hadoop.hbase.regionserver.HStore: Failed with
java.io.FileNotFoundException: File does not exist:
hdfs://namenode:9000/hbase/log_data_exchange/2095948567/f/mapfiles/554805660147274743/data:
2095948567/f/554805660147274743
2009-06-04 11:11:44,885 ERROR
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
Compaction/Split failed for region
log_data_exchange,20090529115336764test675370772,1243644061166
java.io.FileNotFoundException: File does not exist:
hdfs://namenode:9000/hbase/log_data_exchange/2095948567/f/mapfiles/554805660147274743/data
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
at
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
at
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426)
at
org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:327)
at
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:95)
at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:309)
at
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78)
at
org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443)
at
org.apache.hadoop.hbase.regionserver.HStore.createReaders(HStore.java:1049)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:922)
at
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:732)
at
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:689)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:105)
2009-06-04 11:11:44,887 INFO
org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on
region log_activity,20090527160652393test2101793466,1243459595085
2009-06-04 11:12:07,218 INFO
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on
region log_activity,20090527160652393test2101793466,1243459595085 in 22sec
--Kirill