Bryan Beaudreault created HBASE-28216:
-----------------------------------------
Summary: HDFS erasure coding support for table data dirs
Key: HBASE-28216
URL: https://issues.apache.org/jira/browse/HBASE-28216
Project: HBase
Issue Type: New Feature
Reporter: Bryan Beaudreault
[Erasure
coding|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html]
(EC) is a hadoop-3 feature which can drastically reduce storage requirements,
at the expense of locality. At my company we have a few hbase clusters which
are extremely data dense and take mostly write traffic, fewer reads (cold
data). We'd like to reduce the cost of these clusters, and EC is a great way to
do that since it can reduce replication related storage costs by 50%.
It's possible to enable EC policies on sub directories of HDFS. One can
manually set this with {{{}hdfs ec -setPolicy -path
/hbase/data/default/usertable -policy xxxx{}}}. This can work without any hbase
support.
One problem with that is a lack of visibility by operators into which tables
might have EC enabled. I think this is where HBase can help. Here's my proposal:
* Add a new TableDescriptor and ColumnDescriptor field ERASURE_CODING_POLICY
* In ModifyTableProcedure preflightChecks, if ERASURE_CODING_POLICY is set,
verify that the requested policy is available and enabled via
DistributedFileSystem.
getErasureCodingPolicies().
* During ModifyTableProcedure, add a new state for
MODIFY_TABLE_SYNC_ERASURE_CODING_POLICY.
** When adding or changing a policy, use DistributedFileSystem.
setErasureCodingPolicy to sync it for the data and archive dir of that table
(or column in table)
** When removing the property or setting it to empty, use
DistributedFileSystem.
unsetErasureCodingPolicy to remove it from the data and archive dir.
Since this new API is in hadoop-3 only, we'll need to add a reflection wrapper
class for managing the calls and verifying that the API is available. We'll
similarly do that API check in preflightChecks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)