[
https://issues.apache.org/jira/browse/HBASE-25869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell updated HBASE-25869:
----------------------------------------
Description:
WAL storage can be expensive, especially if the cell values represented in the
edits are large, consisting of blobs or significant lengths of text. Such WALs
might need to be kept around for a fairly long time to satisfy replication
constraints on a space limited (or space-contended) filesystem.
We have a custom dictionary compression scheme for cell metadata that is
engaged when WAL compression is enabled in site configuration. This is fine for
that application, where we can expect the universe of values and their lengths
in the custom dictionaries to be constrained. For arbitrary cell values it is
better to use one of the available compression codecs, which are suitable for
arbitrary albeit compressible data.
was:
WAL storage can be expensive, especially if the cell values represented in the
edits are large, consisting of blobs or significant lengths of text. Such WALs
might need to be kept around for a fairly long time to satisfy replication
constraints on a space limited (or space -contended) filesystem.
We have a custom dictionary compression scheme for cell metadata that is
engaged when WAL compression is enabled in site configuration. This is fine for
that application, where we can expect the universe of values (and their
lengths) in the custom dictionaries to be constrained. For arbitrary values it
is better to use Deflate compression, which is a complete LZ-class algorithm
suitable for arbitrary albeit compressible data, is reasonably fast, certainly
fast enough for WALs, compresses well, and is universally available as part of
the Java runtime.
With a trick that encodes whether or not the cell value is compressed in the
high order bit of the type byte, this can be done in a backwards compatible
manner.
> WAL value compression
> ---------------------
>
> Key: HBASE-25869
> URL: https://issues.apache.org/jira/browse/HBASE-25869
> Project: HBase
> Issue Type: Bug
> Components: Operability, wal
> Reporter: Andrew Kyle Purtell
> Assignee: Andrew Kyle Purtell
> Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> WAL storage can be expensive, especially if the cell values represented in
> the edits are large, consisting of blobs or significant lengths of text. Such
> WALs might need to be kept around for a fairly long time to satisfy
> replication constraints on a space limited (or space-contended) filesystem.
> We have a custom dictionary compression scheme for cell metadata that is
> engaged when WAL compression is enabled in site configuration. This is fine
> for that application, where we can expect the universe of values and their
> lengths in the custom dictionaries to be constrained. For arbitrary cell
> values it is better to use one of the available compression codecs, which are
> suitable for arbitrary albeit compressible data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)