[jira] Resolved: (HBASE-25) [hbase] Stuck regionserver?

stack (JIRA) Thu, 13 Mar 2008 20:47:03 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


stack resolved HBASE-25.
------------------------

       Resolution: Invalid
    Fix Version/s: 0.1.0

On a cluster that was running lots of other heavy-duty processes concurrently, 
were seeing lots of regionservers going down because could not connect to 
master within lease interval.  At Jim Firby suggestion, I added logging of how 
long we were actually sleeping though we'd asked sleep for 3 second only.  Last 
night during an upload I caught a message that said we'd slept > 30 seconds, 
longer than default sleep period (See HBASE-501).  I'm guessing this phenomeon 
of threads oversleeping is what we've up to this been calling 'hung server'.  
Closing as invalid.  Can reopen if the added logging does NOT account for 
region servers failing to check in with master within lease period.

> [hbase] Stuck regionserver?
> ---------------------------
>
>                 Key: HBASE-25
>                 URL: https://issues.apache.org/jira/browse/HBASE-25
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: stack
>            Assignee: stack
>            Priority: Trivial
>             Fix For: 0.1.0
>
>
> Looking in logs, a regionserver went down because it could not contact the 
> master after 60 seconds.  Watching logging, the HRS is repeatedly checking 
> all 150 loaded regions over and over again w/ a pause of about 5 seconds 
> between runs... then there is a suspicious 60+ second gap with no logging as 
> though the regionserver had hung up on something:
> {code}
> 2007-12-03 13:14:54,178 DEBUG hbase.HRegionServer - flushing region 
> postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635
> 2007-12-03 13:14:54,178 DEBUG hbase.HRegion - Not flushing cache for region 
> postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635: snapshotMemcaches() 
> determined that there was nothing to do
> 2007-12-03 13:14:54,205 DEBUG hbase.HRegionServer - flushing region 
> postlog,img247/230/seanpaul4li.jpg,1196615889965
> 2007-12-03 13:14:54,205 DEBUG hbase.HRegion - Not flushing cache for region 
> postlog,img247/230/seanpaul4li.jpg,1196615889965: snapshotMemcaches() 
> determined that there was nothing to do
> 2007-12-03 13:16:04,305 FATAL hbase.HRegionServer - unable to report to 
> master for 67467 milliseconds - aborting server
> 2007-12-03 13:16:04,455 INFO  hbase.Leases - 
> regionserver/0:0:0:0:0:0:0:0:60020 closing leases
> 2007-12-03 13:16:04,455 INFO  hbase.Leases$LeaseMonitor - 
> regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker exiting
> {code}
> Master seems to be running fine scanning its ~700 regions.  Then you see this 
> in log, before the HRS shuts itself down.
> {code}
> 2007-12-03 13:14:31,416 INFO  hbase.Leases - HMaster.leaseChecker lease 
> expired 153260899/1532608992007-12-03 13:14:31,417 INFO  hbase.HMaster - 
> XX.XX.XX.102:60020 lease expired
> {code}
> ... and we go on to process shutdown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-25) [hbase] Stuck regionserver?

Reply via email to