Nope, didn't get far enough in specifying an approach to file a JIRA. If you're up for making a go of it, feel free to start a new one.
On Fri, May 3, 2019, 14:44 Biju N <[email protected]> wrote: > Hi Sean, Is there a JIRA ticket for this to follow? > > On Sat, Mar 16, 2019 at 2:00 PM Andrew Purtell <[email protected]> > wrote: > > > Running the file through a standard compressor. Makes handling more > > straightforward eg copy to local filesystem and extraction. We could wait > > to do it until all references to the WAL file are gone so as to not > > complicate things like replication. > > > > > > > On Mar 16, 2019, at 10:17 AM, Sean Busbey <[email protected]> wrote: > > > > > > Yeah I like the idea of compressing them. you thinking of rewriting > > > them with the wal compression feature enabled, or just something > > > simple like running the whole file through a compressor? Maybe I > > > should poke at what difference in resultant file size looks like. > > > > > > IIRC things already get moved out to archive before being deleted. > > > There's a default TTL of something like 10 minutes before a WAL can be > > > deleted from the archive area. > > > > > > disadvantage to always compressing archived WALs would be overhead to > > > the Replication process? anything else? > > > > > > On Sat, Mar 16, 2019 at 10:51 AM Andrew Purtell > > > <[email protected]> wrote: > > >> > > >> How about an option that tells the cleaner to archive them, with > > compression? There’s a lot of wastage in WAL files due to repeated > > information, and reasons to not enable WAL compression for live files, > but > > I think little reason not to rewrite an archived WAL file with a typical > > and standard archival compression format like BZIP if retaining it for > only > > possible debugging purposes. (Or maybe a home grown incremental backup > > solution built on snapshots and log replay. Or...) > > >> > > >> So, a switch that tells the cleaner to archive rather than delete, and > > maybe another toggle that starts a background task to find archived WALs > > that are uncompressed and compress them, only removing them once the > > compressed version is in place. Compress, optionally, in a temporary > > location with final atomic rename like compaction. > > >> > > >> ? > > >> > > >> > > >>> On Mar 16, 2019, at 7:01 AM, Sean Busbey <[email protected]> wrote: > > >>> > > >>> Hi folks! > > >>> > > >>> Sometimes while working to diagnose an HBase failure in production > > settings > > >>> I need to ensure WALs stick around so that I can examine or possibly > > replay > > >>> them. For difficult problems on clusters with plenty of HDFS space > > relative > > >>> to the HBase write workload sometimes that might mean for days or a > > week. > > >>> > > >>> The way I've always done this is by setting up placeholder > replication > > >>> information for a peer that's disabled. It nicely makes the cleaner > > chore > > >>> pass over things, doesn't require a restart of anything, and has a > > >>> relatively straight forward way to go back to normal. > > >>> > > >>> Lately I've been thinking that I do this often enough that a command > > for it > > >>> would be better (kind of like how we can turn the balancer on and > off). > > >>> > > >>> How do other folks handle this operational need? Am I just missing an > > >>> easier way? > > >>> > > >>> If a new command is needed, what do folks think the minimally useful > > >>> version is? Keep all WALs until told otherwise? Limit to most > > recent/oldest > > >>> X bytes? Limit to files that include edits to certain > > >>> namespace/table/region? > > >
