[ 
https://issues.apache.org/jira/browse/HBASE-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834419#action_12834419
 ] 

stack commented on HBASE-2070:
------------------------------

.bq this seems like a good first cut, but we should probably be tracking 
logfiles in ZK

HT does this.  Makes some kinda sense in that who owns them is clear in case of 
HRS failure.  Master could take ownership before starts splitting them.  We 
should make a new issue for this (Hopefully we won't have as many wals going 
forward with working flush).

Yeah too to the logs shouldn't be cleared if replication is down.  Can we put 
up a gate in zk?

On the patch:

Can we change this?

{code}
   static final String HREGION_OLDLOGFILE_NAME = "oldlogfile.log";
{code}

It must be dumbest name ever given a file since the epoch began?  (We should do 
that in another patch.....another issue).

Want to make a regex to verify expected file name rather than: 

{code}
+        String[] parts = filePath.getName().split("\\.");
{code}

Do you have to put a timestamp on it?  Doesn't HDFS tell you its last-modified 
time? (There may be caveats to this but IIRC, for something this basic should 
be fine).

Otherwise patch looks good to me.





> Collect HLogs and delete them after a period of time
> ----------------------------------------------------
>
>                 Key: HBASE-2070
>                 URL: https://issues.apache.org/jira/browse/HBASE-2070
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2070-v2.patch, HBASE-2070-v3.patch, 
> HBASE-2070-v4.patch, HBASE-2070.patch
>
>
> For replication we need to be able to service clusters that are a few hours 
> behind in edits. For example, after distcp'ing a snapshot of the DB to 
> another cluster, we need to make sure we get the edits that came in after the 
> snapshot was taken.
> I plan the following changes:
> - Instead of deleting HLogs during a log roll or after a log split, move them 
> to another folder where all logs should be aggregated.
> - Add a new configuration for how old a log can be. For a normal cluster I 
> think of a default of 2 hours. For replication you may want to set it much 
> higher.
> - Create a new thread in the master that checks for logs older than 
> configured time and that deletes them.
> I also fancy having the deletion time to be configurable while the cluster is 
> running. I'm also thinking of adding a way to tell the cluster to replay 
> edits on itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to