Hi folks!

Sometimes while working to diagnose an HBase failure in production settings
I need to ensure WALs stick around so that I can examine or possibly replay
them. For difficult problems on clusters with plenty of HDFS space relative
to the HBase write workload sometimes that might mean for days or a week.

The way I've always done this is by setting up placeholder replication
information for a peer that's disabled. It nicely makes the cleaner chore
pass over things, doesn't require a restart of anything, and has a
relatively straight forward way to go back to normal.

Lately I've been thinking that I do this often enough that a command for it
would be better (kind of like how we can turn the balancer on and off).

How do other folks handle this operational need? Am I just missing an
easier way?

If a new command is needed, what do folks think the minimally useful
version is? Keep all WALs until told otherwise? Limit to most recent/oldest
X bytes? Limit to files that include edits to certain
namespace/table/region?

Reply via email to