Running the file through a standard compressor. Makes handling more straightforward eg copy to local filesystem and extraction. We could wait to do it until all references to the WAL file are gone so as to not complicate things like replication.
> On Mar 16, 2019, at 10:17 AM, Sean Busbey <[email protected]> wrote: > > Yeah I like the idea of compressing them. you thinking of rewriting > them with the wal compression feature enabled, or just something > simple like running the whole file through a compressor? Maybe I > should poke at what difference in resultant file size looks like. > > IIRC things already get moved out to archive before being deleted. > There's a default TTL of something like 10 minutes before a WAL can be > deleted from the archive area. > > disadvantage to always compressing archived WALs would be overhead to > the Replication process? anything else? > > On Sat, Mar 16, 2019 at 10:51 AM Andrew Purtell > <[email protected]> wrote: >> >> How about an option that tells the cleaner to archive them, with >> compression? There’s a lot of wastage in WAL files due to repeated >> information, and reasons to not enable WAL compression for live files, but I >> think little reason not to rewrite an archived WAL file with a typical and >> standard archival compression format like BZIP if retaining it for only >> possible debugging purposes. (Or maybe a home grown incremental backup >> solution built on snapshots and log replay. Or...) >> >> So, a switch that tells the cleaner to archive rather than delete, and maybe >> another toggle that starts a background task to find archived WALs that are >> uncompressed and compress them, only removing them once the compressed >> version is in place. Compress, optionally, in a temporary location with >> final atomic rename like compaction. >> >> ? >> >> >>> On Mar 16, 2019, at 7:01 AM, Sean Busbey <[email protected]> wrote: >>> >>> Hi folks! >>> >>> Sometimes while working to diagnose an HBase failure in production settings >>> I need to ensure WALs stick around so that I can examine or possibly replay >>> them. For difficult problems on clusters with plenty of HDFS space relative >>> to the HBase write workload sometimes that might mean for days or a week. >>> >>> The way I've always done this is by setting up placeholder replication >>> information for a peer that's disabled. It nicely makes the cleaner chore >>> pass over things, doesn't require a restart of anything, and has a >>> relatively straight forward way to go back to normal. >>> >>> Lately I've been thinking that I do this often enough that a command for it >>> would be better (kind of like how we can turn the balancer on and off). >>> >>> How do other folks handle this operational need? Am I just missing an >>> easier way? >>> >>> If a new command is needed, what do folks think the minimally useful >>> version is? Keep all WALs until told otherwise? Limit to most recent/oldest >>> X bytes? Limit to files that include edits to certain >>> namespace/table/region?
