[ https://issues.apache.org/jira/browse/HBASE-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834506#action_12834506 ]
Jean-Daniel Cryans commented on HBASE-2070: ------------------------------------------- bq. We should make a new issue for this bq. Yeah too to the logs shouldn't be cleared if replication is down. Can we put up a gate in zk? I was planning on doing that in the scope of HBASE-2223. bq. It must be dumbest name ever given a file since the epoch began? (We should do that in another patch.....another issue) Yeah, another issue. bq. Want to make a regex to verify expected file name rather than: Will do bq. Do you have to put a timestamp on it? Doesn't HDFS tell you its last-modified time? (There may be caveats to this but IIRC, for something this basic should be fine). I wanted to avoid 2 logs created at the same time having the same name. It can still happen, but the chance is very very low. Thanks for the review! > Collect HLogs and delete them after a period of time > ---------------------------------------------------- > > Key: HBASE-2070 > URL: https://issues.apache.org/jira/browse/HBASE-2070 > Project: Hadoop HBase > Issue Type: Sub-task > Reporter: Jean-Daniel Cryans > Assignee: Jean-Daniel Cryans > Fix For: 0.21.0 > > Attachments: HBASE-2070-v2.patch, HBASE-2070-v3.patch, > HBASE-2070-v4.patch, HBASE-2070.patch > > > For replication we need to be able to service clusters that are a few hours > behind in edits. For example, after distcp'ing a snapshot of the DB to > another cluster, we need to make sure we get the edits that came in after the > snapshot was taken. > I plan the following changes: > - Instead of deleting HLogs during a log roll or after a log split, move them > to another folder where all logs should be aggregated. > - Add a new configuration for how old a log can be. For a normal cluster I > think of a default of 2 hours. For replication you may want to set it much > higher. > - Create a new thread in the master that checks for logs older than > configured time and that deletes them. > I also fancy having the deletion time to be configurable while the cluster is > running. I'm also thinking of adding a way to tell the cluster to replay > edits on itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.