[
https://issues.apache.org/jira/browse/HBASE-28216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913180#comment-17913180
]
vishal.rajan commented on HBASE-28216:
--------------------------------------
[~bbeaudreault] Could please share the status of running hbase 2.6.x with
erasure coding. Could please share the hdfs version used and issues faced
during migration to erasure coding.
> HDFS erasure coding support for table data dirs
> -----------------------------------------------
>
> Key: HBASE-28216
> URL: https://issues.apache.org/jira/browse/HBASE-28216
> Project: HBase
> Issue Type: New Feature
> Reporter: Bryan Beaudreault
> Assignee: Bryan Beaudreault
> Priority: Major
> Labels: patch-available, pull-request-available
> Fix For: 2.6.0, 3.0.0-beta-2
>
>
> [Erasure
> coding|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html]
> (EC) is a hadoop-3 feature which can drastically reduce storage
> requirements, at the expense of locality. At my company we have a few hbase
> clusters which are extremely data dense and take mostly write traffic, fewer
> reads (cold data). We'd like to reduce the cost of these clusters, and EC is
> a great way to do that since it can reduce replication related storage costs
> by 50%.
> It's possible to enable EC policies on sub directories of HDFS. One can
> manually set this with {{{}hdfs ec -setPolicy -path
> /hbase/data/default/usertable -policy xxxx{}}}. This can work without any
> hbase support.
> One problem with that is a lack of visibility by operators into which tables
> might have EC enabled. I think this is where HBase can help. Here's my
> proposal:
> * Add a new TableDescriptor and ColumnDescriptor field ERASURE_CODING_POLICY
> * In ModifyTableProcedure preflightChecks, if ERASURE_CODING_POLICY is set,
> verify that the requested policy is available and enabled via
> DistributedFileSystem.
> getErasureCodingPolicies().
> * During ModifyTableProcedure, add a new state for
> MODIFY_TABLE_SYNC_ERASURE_CODING_POLICY.
> ** When adding or changing a policy, use DistributedFileSystem.
> setErasureCodingPolicy to sync it for the data and archive dir of that table
> (or column in table)
> ** When removing the property or setting it to empty, use
> DistributedFileSystem.
> unsetErasureCodingPolicy to remove it from the data and archive dir.
> Since this new API is in hadoop-3 only, we'll need to add a reflection
> wrapper class for managing the calls and verifying that the API is available.
> We'll similarly do that API check in preflightChecks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)