[
https://issues.apache.org/jira/browse/HBASE-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289105#comment-13289105
]
Devaraj Das commented on HBASE-6153:
------------------------------------
Yes, I do see from the master logs lots of line like these for the RegionServer
in question:
{noformat}
2012-05-31 18:18:55,387 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_CLOSED,
server=ip-10-68-7-146.ec2.internal,60020,1338343120038,
region=853c3285c9e90e532a37fcf36e8dee27
2012-05-31 18:18:57,378 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_CLOSED,
server=ip-10-68-7-146.ec2.internal,60020,1338343120038,
region=f1742456972610dd1b47d69849135e90
2012-05-31 18:18:57,404 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_CLOSED,
server=ip-10-68-7-146.ec2.internal,60020,1338343120038,
region=801e6b62dd8bdbe1c0139b6d351af22d
2012-05-31 18:19:00,067 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_CLOSED,
server=ip-10-68-7-146.ec2.internal,60020,1338343120038,
region=f42aea93c92323de04925e8cab7d81b4
2012-05-31 18:19:01,968 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_CLOSED,
server=ip-10-68-7-146.ec2.internal,60020,1338343120038,
region=07c5db3cfb48cf644c2ac39687a40af2
2012-05-31 18:19:04,554 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_CLOSED,
server=ip-10-68-7-146.ec2.internal,60020,1338343120038,
region=a481cc63b07070ae4984d80f32a78c08
{noformat}
but it doesn't have a line for the region 8974506aa04c5a04e5cc23c11de0039d ..
This line is there:
2012-05-31 18:19:36,707 DEBUG
org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting region
TestLoadAndVerify_1338488017181,\x15\xD9\x01\x00\x00\x00\x00\x00/000087_0,1338491364569.8974506aa04c5a04e5cc23c11de0039d.
from META and FS
In the RegionServer logs, I do see lines like this:
{noformat}
2012-05-31 18:18:57,205 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region
TestLoadAndVerify_1338488017181,\x81\xA5\x01\x00\x00\x00\x00\x00/000074_0,1338491056480.f1742456972610dd1b47d69849135e90.
2012-05-31 18:18:57,205 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Finished snapshotting
TestLoadAndVerify_1338488017181,\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xEE,1338491056480.801e6b62dd8bdbe1c0139b6d351af22d.,
commencing wait for mvcc, flushsize=7439360
2012-05-31 18:18:57,215 DEBUG org.apache.hadoop.hbase.io.hfile.HFileWriterV2:
Initialized with CacheConfig:enabled [cacheDataOnRead=true]
[cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false]
[cacheEvictOnClose=false] [cacheCompressed=false]
2012-05-31 18:18:57,218 DEBUG org.apache.hadoop.hbase.regionserver.Store:
closed f1
2012-05-31 18:18:57,218 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed
TestLoadAndVerify_1338488017181,\x81\xA5\x01\x00\x00\x00\x00\x00/000074_0,1338491056480.f1742456972610dd1b47d69849135e90.
2012-05-31 18:18:57,218 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x3379b7614800004-0x3379b7614800004-0x3379b7614800004
Attempting to transition node f1742456972610dd1b47d69849135e90 from
M_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED
{noformat}
Compaction started and failed:
{noformat}
2012-05-31 18:25:20,968 DEBUG
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Small Compaction
requested:
regionName=TestLoadAndVerify_1338488017181,\x15\xD9\x01\x00\x00\x00\x00\x00/000087_0,1338491364569.8974506aa04c5a04e5cc23c11de0039d.,
storeName=f1, fileCount=5, fileSize=182.6m (43.0m, 45.5m, 31.4m, 31.4m,
31.4m), priority=0, time=161781718961048; Because:
regionserver60020.compactionChecker requests compaction;
compaction_queue=(2:2), split_queue=433
{noformat}
{noformat}
2012-05-31 18:25:33,383 ERROR
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: Compaction
failed
regionName=TestLoadAndVerify_1338488017181,\x15\xD9\x01\x00\x00\x00\x00\x00/000087_0,1338491364569.8974506aa04c5a04e5cc23c11de0039d.,
storeName=f1, fileCount=5, fileSize=182.6m (43.0m, 45.5m, 31.4m, 31.4m,
31.4m), priority=0, time=161781718961048
{noformat}
Some more logs related to the region 8974506aa04c5a04e5cc23c11de0039d before
the failure to rename
{noformat}
2012-05-31 18:34:18,157 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too
many hlogs: logs=33, maxlogs=32; forcing flush of 2 regions(s):
51c11f4eafdb5b5f0ff11ca998a07b43, 8974506aa04c5a04e5cc23c11de0039d
{noformat}
{noformat}
2012-05-31 18:34:40,268 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Started memstore flush for
TestLoadAndVerify_1338488017181,\x15\xD9\x01\x00\x00\x00\x00\x00/000087_0,1338491364569.8974506aa04c5a04e5cc23c11de0039d.,
current region memstore size 58.2m
2012-05-31 18:34:40,331 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Finished snapshotting
TestLoadAndVerify_1338488017181,\x15\xD9\x01\x00\x00\x00\x00\x00/000087_0,1338491364569.8974506aa04c5a04e5cc23c11de0039d.,
commencing wait for mvcc, flushsize=61056152
{noformat}
> RS aborted due to rename problem (maybe a race)
> -----------------------------------------------
>
> Key: HBASE-6153
> URL: https://issues.apache.org/jira/browse/HBASE-6153
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.0
> Reporter: Devaraj Das
> Assignee: Devaraj Das
>
> I had a RS crash with the following:
> 2012-05-31 18:34:42,534 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Renaming flushed file at
> hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35
> to
> hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
> 2012-05-31 18:34:42,536 WARN org.apache.hadoop.hbase.regionserver.Store:
> Unable to rename
> hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35
> to
> hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
> 2012-05-31 18:34:42,541 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> ip-10-68-7-146.ec2.internal,60020,1338343120038: Replay of HLog required.
> Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region:
> TestLoadAndVerify_1338488017181,\x15\xD9\x01\x00\x00\x00\x00\x00/000087_0,1338491364569.8974506aa04c5a04e5cc23c11de0039d.
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1288)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1172)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1114)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:400)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:374)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.FileNotFoundException: File does not exist:
> /apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1901)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1892)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:636)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
> at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:387)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1008)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:470)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548)
> at
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:595)
> On the NameNode logs:
> 2012-05-31 18:34:42,588 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> FSDirectory.unprotectedRenameTo: failed to rename
> /apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35
> to
> /apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
> because destination's parent does not exist
> I haven't looked deeply yet but I guess it is a race of some sort.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira