[ 
https://issues.apache.org/jira/browse/HBASE-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833211#action_12833211
 ] 

ryan rawson commented on HBASE-2070:
------------------------------------

if a replication stream is delayed, we should never delete logfiles unless the 
disk space situation is critical.  Replication sending clusters should have 
plenty of disk space to buffer past all foreseeable disconnection operations.  
This might mean buffering 5-10TB of edits...

the alternative is to reset the slave cluster and rebuild from scratch once you 
lose the sync.  Otherwise you end up with duplicate edits that are not 
removable.

> Collect HLogs and delete them after a period of time
> ----------------------------------------------------
>
>                 Key: HBASE-2070
>                 URL: https://issues.apache.org/jira/browse/HBASE-2070
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2070-v2.patch, HBASE-2070-v3.patch, 
> HBASE-2070-v4.patch, HBASE-2070.patch
>
>
> For replication we need to be able to service clusters that are a few hours 
> behind in edits. For example, after distcp'ing a snapshot of the DB to 
> another cluster, we need to make sure we get the edits that came in after the 
> snapshot was taken.
> I plan the following changes:
> - Instead of deleting HLogs during a log roll or after a log split, move them 
> to another folder where all logs should be aggregated.
> - Add a new configuration for how old a log can be. For a normal cluster I 
> think of a default of 2 hours. For replication you may want to set it much 
> higher.
> - Create a new thread in the master that checks for logs older than 
> configured time and that deletes them.
> I also fancy having the deletion time to be configurable while the cluster is 
> running. I'm also thinking of adding a way to tell the cluster to replay 
> edits on itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to