[ 
https://issues.apache.org/jira/browse/HADOOP-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-1820:
----------------------------------

    Issue Type: Bug  (was: Improvement)

> [hbase] regionserver creates hlogs without bound
> ------------------------------------------------
>
>                 Key: HADOOP-1820
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1820
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: Jim Kellerman
>         Attachments: excerpt.log
>
>
> Regionserver keeps log of all edits for all the regions its carrying.  Its 
> used recoverying state if a regionserver crashes: edits that have not been 
> persisted to an HStoreFile are rerun to populate memcache which in turn is 
> converted to an on-filesytem HStoreFile.  On a period, the log is rotated and 
> a new one is opened.  While the region server is up, the logs grow in number 
> without bound.  Only the most recent contain unpersisted edits.  If the 
> region server goes down clean, then its logs are cleaned up. If a region 
> server crashes, as part of recovery, the logs of edits are sorted and split  
> per region.  Recovery would run faster if it did not have to plough through 
> reams of stale edits.
> Just now, I had a host crash w/ 112 log files each of 30k plus edits each.
> We could rename the log rolling thread the log maintainer.  As well as 
> rolling logs, it could check for edit logs to clean.  When rolled, logs could 
> be marked with the sequence id of their last contained edit.  The thread 
> could on a period ask each hosted region for the "lowest highest" sequence id 
> of all regions deployed.  Once this number had crossed out that on a 
> particular log, the log could be cleaned up safely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to