Hi Sean, Is there a JIRA ticket for this to follow? On Sat, Mar 16, 2019 at 2:00 PM Andrew Purtell <[email protected]> wrote:
> Running the file through a standard compressor. Makes handling more > straightforward eg copy to local filesystem and extraction. We could wait > to do it until all references to the WAL file are gone so as to not > complicate things like replication. > > > > On Mar 16, 2019, at 10:17 AM, Sean Busbey <[email protected]> wrote: > > > > Yeah I like the idea of compressing them. you thinking of rewriting > > them with the wal compression feature enabled, or just something > > simple like running the whole file through a compressor? Maybe I > > should poke at what difference in resultant file size looks like. > > > > IIRC things already get moved out to archive before being deleted. > > There's a default TTL of something like 10 minutes before a WAL can be > > deleted from the archive area. > > > > disadvantage to always compressing archived WALs would be overhead to > > the Replication process? anything else? > > > > On Sat, Mar 16, 2019 at 10:51 AM Andrew Purtell > > <[email protected]> wrote: > >> > >> How about an option that tells the cleaner to archive them, with > compression? There’s a lot of wastage in WAL files due to repeated > information, and reasons to not enable WAL compression for live files, but > I think little reason not to rewrite an archived WAL file with a typical > and standard archival compression format like BZIP if retaining it for only > possible debugging purposes. (Or maybe a home grown incremental backup > solution built on snapshots and log replay. Or...) > >> > >> So, a switch that tells the cleaner to archive rather than delete, and > maybe another toggle that starts a background task to find archived WALs > that are uncompressed and compress them, only removing them once the > compressed version is in place. Compress, optionally, in a temporary > location with final atomic rename like compaction. > >> > >> ? > >> > >> > >>> On Mar 16, 2019, at 7:01 AM, Sean Busbey <[email protected]> wrote: > >>> > >>> Hi folks! > >>> > >>> Sometimes while working to diagnose an HBase failure in production > settings > >>> I need to ensure WALs stick around so that I can examine or possibly > replay > >>> them. For difficult problems on clusters with plenty of HDFS space > relative > >>> to the HBase write workload sometimes that might mean for days or a > week. > >>> > >>> The way I've always done this is by setting up placeholder replication > >>> information for a peer that's disabled. It nicely makes the cleaner > chore > >>> pass over things, doesn't require a restart of anything, and has a > >>> relatively straight forward way to go back to normal. > >>> > >>> Lately I've been thinking that I do this often enough that a command > for it > >>> would be better (kind of like how we can turn the balancer on and off). > >>> > >>> How do other folks handle this operational need? Am I just missing an > >>> easier way? > >>> > >>> If a new command is needed, what do folks think the minimally useful > >>> version is? Keep all WALs until told otherwise? Limit to most > recent/oldest > >>> X bytes? Limit to files that include edits to certain > >>> namespace/table/region? >
