[
https://issues.apache.org/jira/browse/HBASE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696720#action_12696720
]
Nitay Joffe commented on HBASE-1314:
------------------------------------
Good catch. Yes, I agree we need some better handling for these cases.
> master sees HRS znode expire and splits log while the HRS is still running
> and accepting edits
> ----------------------------------------------------------------------------------------------
>
> Key: HBASE-1314
> URL: https://issues.apache.org/jira/browse/HBASE-1314
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: Andrew Purtell
>
> ZK session expiration related problem. HRS loses its ephemeral node while it
> is still up and running and accepting edits. Master sees it go away and
> starts splitting its logs while edits are still being written. After this,
> all reconstruction logs have to be manually removed from the region
> directories or the regions will never deploy (CRC errors). I think on HDFS
> edits would be lost, not corrupted. (I am using a HBase root on local file
> system.)
> HRS ZK session expires, causing its znode to go away:
> 2009-04-07 03:50:39,953 INFO org.apache.hadoop.hbase.master.ServerManager:
> localhost.localdomain_1239068648333_60020 znode expired
> 2009-04-07 03:50:40,565 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Processing todo: ProcessServerShutdown of
> localhost.localdomain_1239068648333_60020
> 2009-04-07 03:50:40,637 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
> server localhost.localdomain_1239068648333_60020: logSplit: false,
> rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> But here we have the HRS still reporting in, triggering a
> LeaseStillHeldException:
> 2009-04-07 03:50:40,826 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 2 on 60000, call regionServerReport(address: 127.0.0.1:60020,
> startcode: 1239068648333, load: (requests=14, regions=7, usedHeap=582,
> maxHeap=888), [Lorg.apache.hadoop.hbase.HMsg;@6da21389,
> [Lorg.apache.hadoop.hbase.HRegionInfo;@2bb0bf9a) from 127.0.0.1:39238: error:
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
> at
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:198)
> at
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:601)
> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:909)
> And log splitting starts anyway:
> 2009-04-07 03:50:41,139 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Splitting 3 log(s) in
> file:/data/hbase/log_localhost.localdomain_1239068648333_60020
> 2009-04-07 03:50:41,139 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Splitting 1 of 3:
> file:/data/hbase/log_localhost.localdomain_1239068648333_60020/hlog.dat.1239075060711
> [...]
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.