Hi folks! Sometimes while working to diagnose an HBase failure in production settings I need to ensure WALs stick around so that I can examine or possibly replay them. For difficult problems on clusters with plenty of HDFS space relative to the HBase write workload sometimes that might mean for days or a week.
The way I've always done this is by setting up placeholder replication information for a peer that's disabled. It nicely makes the cleaner chore pass over things, doesn't require a restart of anything, and has a relatively straight forward way to go back to normal. Lately I've been thinking that I do this often enough that a command for it would be better (kind of like how we can turn the balancer on and off). How do other folks handle this operational need? Am I just missing an easier way? If a new command is needed, what do folks think the minimally useful version is? Keep all WALs until told otherwise? Limit to most recent/oldest X bytes? Limit to files that include edits to certain namespace/table/region?
