[ 
https://issues.apache.org/jira/browse/HBASE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696403#action_12696403
 ] 

Andrew Purtell commented on HBASE-1314:
---------------------------------------

Ephemeral znode expiration should not immediately trigger recovery action.

If the process is not dead but instead only its ZK session has expired, then we 
can expect it will reinitialize its ZK wrapper and replace the ephemeral nodes. 

Any process (such as master here for this issue) watching the ephemeral node 
should wait to see if it reappears for some configurable interval rather than 
immediately taking action. Once the interval has elapsed, then the process can 
be considered dead. 

> master sees HRS znode expire and splits log while the HRS is still running 
> and accepting edits
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1314
>                 URL: https://issues.apache.org/jira/browse/HBASE-1314
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>
> ZK session expiration related problem. HRS loses its ephemeral node while it 
> is still up and running and accepting edits. Master sees it go away and 
> starts splitting its logs. After this, all reconstruction logs have to be 
> manually removed from the region directories or the regions will never deploy 
> (CRC errors). I think on HDFS edits would be lost, not corrupted. (I am using 
> a HBase root on local file system.) 
> HRS ZK session expires, causing its znode to go away:
> 2009-04-07 03:50:39,953 INFO org.apache.hadoop.hbase.master.ServerManager: 
> localhost.localdomain_1239068648333_60020 znode expired
> 2009-04-07 03:50:40,565 DEBUG org.apache.hadoop.hbase.master.HMaster: 
> Processing todo: ProcessServerShutdown of 
> localhost.localdomain_1239068648333_60020
> 2009-04-07 03:50:40,637 INFO 
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of 
> server localhost.localdomain_1239068648333_60020: logSplit: false, 
> rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> But here we have the HRS still reporting in, triggering a 
> LeaseStillHeldException:
> 2009-04-07 03:50:40,826 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
> handler 2 on 60000, call regionServerReport(address: 127.0.0.1:60020, 
> startcode: 1239068648333, load: (requests=14, regions=7, usedHeap=582, 
> maxHeap=888), [Lorg.apache.hadoop.hbase.HMsg;@6da21389, 
> [Lorg.apache.hadoop.hbase.HRegionInfo;@2bb0bf9a) from 127.0.0.1:39238: error: 
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>         at 
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:198)
>         at 
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:601)
>         at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:909)
> And log splitting starts anyway:
> 2009-04-07 03:50:41,139 INFO org.apache.hadoop.hbase.regionserver.HLog: 
> Splitting 3 log(s) in 
> file:/data/hbase/log_localhost.localdomain_1239068648333_60020
> 2009-04-07 03:50:41,139 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Splitting 1 of 3: 
> file:/data/hbase/log_localhost.localdomain_1239068648333_60020/hlog.dat.1239075060711
> [...]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to