[ https://issues.apache.org/jira/browse/HBASE-28216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799175#comment-17799175 ]
Nihal Jain edited comment on HBASE-28216 at 12/20/23 9:18 PM: -------------------------------------------------------------- Hey [~bbeaudreault] I think there are failures in hbase-shell after this commit. I see tests failing locally due to undefined constant. Also checked commit build is also failing. Mind having a look? was (Author: nihaljain.cs): Hey [~bbeaudreault] I think there are failures in hbase-shell after this commit. I see tests failing locally due to undefined constant. Also checked commit build os also failing. Mind having a look? > HDFS erasure coding support for table data dirs > ----------------------------------------------- > > Key: HBASE-28216 > URL: https://issues.apache.org/jira/browse/HBASE-28216 > Project: HBase > Issue Type: New Feature > Reporter: Bryan Beaudreault > Assignee: Bryan Beaudreault > Priority: Major > Labels: patch-available > > [Erasure > coding|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html] > (EC) is a hadoop-3 feature which can drastically reduce storage > requirements, at the expense of locality. At my company we have a few hbase > clusters which are extremely data dense and take mostly write traffic, fewer > reads (cold data). We'd like to reduce the cost of these clusters, and EC is > a great way to do that since it can reduce replication related storage costs > by 50%. > It's possible to enable EC policies on sub directories of HDFS. One can > manually set this with {{{}hdfs ec -setPolicy -path > /hbase/data/default/usertable -policy xxxx{}}}. This can work without any > hbase support. > One problem with that is a lack of visibility by operators into which tables > might have EC enabled. I think this is where HBase can help. Here's my > proposal: > * Add a new TableDescriptor and ColumnDescriptor field ERASURE_CODING_POLICY > * In ModifyTableProcedure preflightChecks, if ERASURE_CODING_POLICY is set, > verify that the requested policy is available and enabled via > DistributedFileSystem. > getErasureCodingPolicies(). > * During ModifyTableProcedure, add a new state for > MODIFY_TABLE_SYNC_ERASURE_CODING_POLICY. > ** When adding or changing a policy, use DistributedFileSystem. > setErasureCodingPolicy to sync it for the data and archive dir of that table > (or column in table) > ** When removing the property or setting it to empty, use > DistributedFileSystem. > unsetErasureCodingPolicy to remove it from the data and archive dir. > Since this new API is in hadoop-3 only, we'll need to add a reflection > wrapper class for managing the calls and verifying that the API is available. > We'll similarly do that API check in preflightChecks. -- This message was sent by Atlassian Jira (v8.20.10#820010)