Re: Purposefully keeping around WAL files

Biju N Fri, 03 May 2019 12:52:58 -0700

Hi Sean, Is there a JIRA ticket for this to follow?

On Sat, Mar 16, 2019 at 2:00 PM Andrew Purtell <[email protected]>
wrote:


> Running the file through a standard compressor. Makes handling more
> straightforward eg copy to local filesystem and extraction. We could wait
> to do it until all references to the WAL file are gone so as to not
> complicate things like replication.
>
>
> > On Mar 16, 2019, at 10:17 AM, Sean Busbey <[email protected]> wrote:
> >
> > Yeah I like the idea of compressing them. you thinking of rewriting
> > them with the wal compression feature enabled, or just something
> > simple like running the whole file through a compressor? Maybe I
> > should poke at what difference in resultant file size looks like.
> >
> > IIRC things already get moved out to archive before being deleted.
> > There's a default TTL of something like 10 minutes before a WAL can be
> > deleted from the archive area.
> >
> > disadvantage to always compressing archived WALs would be overhead to
> > the Replication process? anything else?
> >
> > On Sat, Mar 16, 2019 at 10:51 AM Andrew Purtell
> > <[email protected]> wrote:
> >>
> >> How about an option that tells the cleaner to archive them, with
> compression? There’s a lot of wastage in WAL files due to repeated
> information, and reasons to not enable WAL compression for live files, but
> I think little reason not to rewrite an archived WAL file with a typical
> and standard archival compression format like BZIP if retaining it for only
> possible debugging purposes. (Or maybe a home grown incremental backup
> solution built on snapshots and log replay. Or...)
> >>
> >> So, a switch that tells the cleaner to archive rather than delete, and
> maybe another toggle that starts a background task to find archived WALs
> that are uncompressed and compress them, only removing them once the
> compressed version is in place. Compress, optionally, in a temporary
> location with final atomic rename like compaction.
> >>
> >> ?
> >>
> >>
> >>> On Mar 16, 2019, at 7:01 AM, Sean Busbey <[email protected]> wrote:
> >>>
> >>> Hi folks!
> >>>
> >>> Sometimes while working to diagnose an HBase failure in production
> settings
> >>> I need to ensure WALs stick around so that I can examine or possibly
> replay
> >>> them. For difficult problems on clusters with plenty of HDFS space
> relative
> >>> to the HBase write workload sometimes that might mean for days or a
> week.
> >>>
> >>> The way I've always done this is by setting up placeholder replication
> >>> information for a peer that's disabled. It nicely makes the cleaner
> chore
> >>> pass over things, doesn't require a restart of anything, and has a
> >>> relatively straight forward way to go back to normal.
> >>>
> >>> Lately I've been thinking that I do this often enough that a command
> for it
> >>> would be better (kind of like how we can turn the balancer on and off).
> >>>
> >>> How do other folks handle this operational need? Am I just missing an
> >>> easier way?
> >>>
> >>> If a new command is needed, what do folks think the minimally useful
> >>> version is? Keep all WALs until told otherwise? Limit to most
> recent/oldest
> >>> X bytes? Limit to files that include edits to certain
> >>> namespace/table/region?
>

Re: Purposefully keeping around WAL files

Reply via email to