[jira] Commented: (HBASE-1314) master sees HRS znode expire and splits log while the HRS is still running and accepting edits

Jim Kellerman (JIRA) Fri, 08 May 2009 16:31:11 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707556#action_12707556
 ]


Jim Kellerman commented on HBASE-1314:
--------------------------------------

Or increase the timeout interval. In 0.19, the master waits a full 2 minutes 
before declaring a region server to be dead (with a comment that the lease 
timeout interval may have to be increased on heavily loaded clusters (but this 
is due to lease renewal piggybacking on regionServerReport)).

I'm not really up-to-date on how the ZK stuff works, but it seems to me that 
renewing the lease should happen in a separate thread with a small sleep 
interval to keep other work from interfering with lease renewal.

How does the region server renew its lease anyway?

> master sees HRS znode expire and splits log while the HRS is still running 
> and accepting edits
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1314
>                 URL: https://issues.apache.org/jira/browse/HBASE-1314
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>             Fix For: 0.20.0
>
>
> ZK session expiration related problem. HRS loses its ephemeral node while it 
> is still up and running and accepting edits. Master sees it go away and 
> starts splitting its logs while edits are still being written. After this, 
> all reconstruction logs have to be manually removed from the region 
> directories or the regions will never deploy (CRC errors). I think on HDFS 
> edits would be lost, not corrupted. (I am using a HBase root on local file 
> system.) 
> HRS ZK session expires, causing its znode to go away:
> 2009-04-07 03:50:39,953 INFO org.apache.hadoop.hbase.master.ServerManager: 
> localhost.localdomain_1239068648333_60020 znode expired
> 2009-04-07 03:50:40,565 DEBUG org.apache.hadoop.hbase.master.HMaster: 
> Processing todo: ProcessServerShutdown of 
> localhost.localdomain_1239068648333_60020
> 2009-04-07 03:50:40,637 INFO 
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of 
> server localhost.localdomain_1239068648333_60020: logSplit: false, 
> rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> But here we have the HRS still reporting in, triggering a 
> LeaseStillHeldException:
> 2009-04-07 03:50:40,826 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
> handler 2 on 60000, call regionServerReport(address: 127.0.0.1:60020, 
> startcode: 1239068648333, load: (requests=14, regions=7, usedHeap=582, 
> maxHeap=888), [Lorg.apache.hadoop.hbase.HMsg;@6da21389, 
> [Lorg.apache.hadoop.hbase.HRegionInfo;@2bb0bf9a) from 127.0.0.1:39238: error: 
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>         at 
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:198)
>         at 
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:601)
>         at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:909)
> And log splitting starts anyway:
> 2009-04-07 03:50:41,139 INFO org.apache.hadoop.hbase.regionserver.HLog: 
> Splitting 3 log(s) in 
> file:/data/hbase/log_localhost.localdomain_1239068648333_60020
> 2009-04-07 03:50:41,139 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
> Splitting 1 of 3: 
> file:/data/hbase/log_localhost.localdomain_1239068648333_60020/hlog.dat.1239075060711
> [...]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1314) master sees HRS znode expire and splits log while the HRS is still running and accepting edits

Reply via email to