[
https://issues.apache.org/jira/browse/HBASE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707556#action_12707556
]
Jim Kellerman commented on HBASE-1314:
--------------------------------------
Or increase the timeout interval. In 0.19, the master waits a full 2 minutes
before declaring a region server to be dead (with a comment that the lease
timeout interval may have to be increased on heavily loaded clusters (but this
is due to lease renewal piggybacking on regionServerReport)).
I'm not really up-to-date on how the ZK stuff works, but it seems to me that
renewing the lease should happen in a separate thread with a small sleep
interval to keep other work from interfering with lease renewal.
How does the region server renew its lease anyway?
> master sees HRS znode expire and splits log while the HRS is still running
> and accepting edits
> ----------------------------------------------------------------------------------------------
>
> Key: HBASE-1314
> URL: https://issues.apache.org/jira/browse/HBASE-1314
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: Andrew Purtell
> Fix For: 0.20.0
>
>
> ZK session expiration related problem. HRS loses its ephemeral node while it
> is still up and running and accepting edits. Master sees it go away and
> starts splitting its logs while edits are still being written. After this,
> all reconstruction logs have to be manually removed from the region
> directories or the regions will never deploy (CRC errors). I think on HDFS
> edits would be lost, not corrupted. (I am using a HBase root on local file
> system.)
> HRS ZK session expires, causing its znode to go away:
> 2009-04-07 03:50:39,953 INFO org.apache.hadoop.hbase.master.ServerManager:
> localhost.localdomain_1239068648333_60020 znode expired
> 2009-04-07 03:50:40,565 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Processing todo: ProcessServerShutdown of
> localhost.localdomain_1239068648333_60020
> 2009-04-07 03:50:40,637 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
> server localhost.localdomain_1239068648333_60020: logSplit: false,
> rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> But here we have the HRS still reporting in, triggering a
> LeaseStillHeldException:
> 2009-04-07 03:50:40,826 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 2 on 60000, call regionServerReport(address: 127.0.0.1:60020,
> startcode: 1239068648333, load: (requests=14, regions=7, usedHeap=582,
> maxHeap=888), [Lorg.apache.hadoop.hbase.HMsg;@6da21389,
> [Lorg.apache.hadoop.hbase.HRegionInfo;@2bb0bf9a) from 127.0.0.1:39238: error:
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
> at
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:198)
> at
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:601)
> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:909)
> And log splitting starts anyway:
> 2009-04-07 03:50:41,139 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Splitting 3 log(s) in
> file:/data/hbase/log_localhost.localdomain_1239068648333_60020
> 2009-04-07 03:50:41,139 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
> Splitting 1 of 3:
> file:/data/hbase/log_localhost.localdomain_1239068648333_60020/hlog.dat.1239075060711
> [...]
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.