[ 
https://issues.apache.org/jira/browse/HBASE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15728144#comment-15728144
 ] 

Daniel Vimont edited comment on HBASE-17257 at 12/15/16 5:24 AM:
-----------------------------------------------------------------

Here are some specifications of what I’ve currently designed and coded for 
column-aliasing (and will soon be submitting as a patch)...


*COLUMN-ALIASING FOR THE END-USER*:
>From the end-user perspective, column-aliasing entails the following two 
>things...


(1) _Environmental configuration to enable aliasing_:
Aliasing makes use of the already-existing {{hbase.client.connection.impl}} 
configuration parameter. The following entry should be added to 
{{hbase-site.xml}}:
{code}
  <property>
    <name>hbase.client.connection.impl</name>
    <value>org.apache.hadoop.hbase.client.AliasEnabledConnection</value>
  </property>
{code}
Setting this parameter to this value results in 
{{ConnectionFactory#createConnection}} returning Connections of the new 
{{AliasEnabledConnection}} class (subclass of the {{ConnectionImplementation}} 
class).


(2) _Alias-enabling individual column families_:
Aliasing is enabled at the column-family level. When adding a column-descriptor 
(i.e. family) to a Table, the new method {{HColumnDescriptor#setAliasSize}} may 
be invoked to immutably set the fixed size (in bytes) of column-qualifier 
aliases for the column family. The default value of 0 (aliasing disabled) may 
be changed to either 1, 2, or 4.


Other than the above, the end-user-application code should neither require nor 
contain any “awareness” whatsoever that column-aliasing is being utilized for a 
column-family. An end-user-application continues to interact only with the 
standard interfaces of the client API ({{Connection}}, {{Table}}, 
{{BufferedMutator}}, and {{HTableMultiplexer}}).


*COLUMN-ALIASING INTERNALS*:
One of the overriding goals in designing the column-aliasing infrastructure is 
to minimize alterations and insertions into already-existing hbase-client code, 
and to have very-close-to-zero impact on already-existing functionality, 
particularly in those situations in which aliasing will NOT be used. The 
following is a comprehensive list of all new and modified modules, along with 
an explanation as to the role that the new or modified module plays in aliasing.


_Modified classes_:
*HColumnDescriptor*: as described above, the new method {{#setAliasSize}} has 
been added; also corresponding methods, {{#getAliasSize}} and 
{{#isAliasEnabled}} (returns “true” if aliasSize not zero).
*HTableDescriptor*: new method {{#hasAliasEnabledFamily}} has been added 
(returns “true” if one or more of the table’s families are aliasEnabled).
(Corresponding modifications were also made to {{TestHColumnDescriptor}} and 
{{TestHTableDescriptor}}, to test the new methods appropriately.)


_New class_:
*AliasEnabledConnection* (subclass of {{ConnectionImplementation}}): provides 
overrides of {{#getTable}} and {{#getBufferedMutator}} to return objects of the 
{{AliasEnabledTable}} class and the {{AliasEnabledBufferedMutator}} class, 
respectively.


_Modified class_:
*HTableMultiplexer*: new static method added -- 
{{#getAliasEnabledTableMultiplexer}}, returns an {{HTableMultiplexer}} object 
that is actually an instance of the new subclass, 
{{AliasEnabledTableMultiplexer}}.


_New classes_:
*AliasEnabledTable* (subclass of {{HTable}})
*AliasEnabledBufferedMutator* (subclass of {{BufferedMutatorImpl}})
*AliasEnabledTableMultiplexer* (subclass of {{HTableMultiplexer}})
-- all the above contain overrides which allow for {{AliasManager}} methods to 
be invoked when needed -- to perform qualifier-to-alias conversions (for 
{{Get}}, {{Scan}}, and {{Mutation}} objects), and alias-to-qualifier 
conversions (for {{Result}} objects) -- for any Table for which 
{{HTableDescriptor#hasAliasEnabledFamily}} is “true”.


_New class_:
*AliasManager*: performs all alias-oriented conversions -- qualifier-to-alias 
conversions for queries and  mutations, and alias-to-qualifier conversions for 
results. It fully encapsulates all CRUD transactions against the 
{{aliasMappingTable}} (the HBase table in which qualifier-to-alias mappings are 
persisted for each alias-enabled column family). When a {{Mutation}} object 
contains a column-qualifier for which an alias entry does not yet exist, a new 
alias is generated and stored in a qualifier-to-alias mapping entry in the 
{{aliasMappingTable}}. The first time an {{AliasManager}} is instantiated 
against an HBase cluster, the {{aliasMappingTable}} will be created if it does 
not already exist.


_New reserved HBase table_:
*aliasMappingTable*: Each row on the {{aliasMappingTable}} corresponds to an 
aliasEnabled column-family on a user-table. The rowId of each 
{{aliasMappingTable}} row is in the format: {{[fully-qualified-user-table-name 
+ ":" + aliasEnabled-Family]}}. On each {{aliasMappingTable}} row, the column 
with an EMPTY_BYTE_ARRAY column-qualifier is reserved for an Increment value 
used to generate new unique alias values within the range stipulated by the 
aliasSize (1, 2, or 4 bytes) of the column-family. All other columns on an 
{{aliasMappingTable}} row are key:value pairings which map a 
user-column-qualifier to its corresponding alias.


_Modified interface and class_:
*Admin*
*HBaseAdmin*
-- new method added, {{#deleteColumnFamilyAliases}}. While usage is not 
mandatory, this method may be invoked to remove from the {{aliasMappingTable}} 
the row associated with a specific column-family. It may only be successfully 
invoked after the column-family has been fully deleted from its table, or after 
the table itself has been deleted.


_Modified class_:
*Get*: new package-protected method {{#setFamilyMap}} -- added to make {{Get}} 
consistent with {{Scan}}, {{Mutation}}, etc. (which already have such a 
method); this was required to allow {{AliasManager}} to cleanly produce 
alias-converted {{Get}} objects.


_Modified class_:
*ConnectionImplementation*: Inadvertent usage of standard {{HTable}} and 
{{BufferedMutatorImpl}} objects against a Table with alias-enabled 
column-families would result in corruption of the Table’s data. To prevent 
this, the methods {{#getTable}} and {{#getBufferedMutator}} were modified with 
the addition of a call to the static method 
{{AliasManager#verifyConnectionForAliasEnabledTable}}, which throws an 
{{IllegalStateException}} if a Table is alias-enabled and the 
{{AliasEnabledConnection.class}} is not assignable from the class of the 
current {{Connection}}.


_Modified classes_:
*ConnectionImplementation*
*HTable*
-- the method {{ConnectionImplementation#getBufferedMutator}} was refactored 
into two separate methods, with the original {{#getBufferedMutator}} method now 
calling a new package-protected method called {{#getBufferedMutatorImpl}}. The 
HTable class internally uses a {{BufferedMutatorImpl}} object to accomplish 
some of its processing, and its invocation of 
{{ConnectionImplementation#getBufferedMutator}} needed to be changed to the new 
{{#getBufferedMutatorImpl}} method in order to assure proper functioning of 
both {{HTable}} and its new subclass, {{AliasEnabledTable}}. (Without this 
change, {{AliasEnabledTable}} would incorrectly instantiate an internal 
{{AliasEnabledBufferedMutator}} instead of the required standard 
{{BufferedMutatorImpl}}.)


_Modified classes_:
*TableName*: constant added for {{ALIAS_TABLE_NAME}}.


_Added TEST classes_:
-- Test classes were added (in the hbase-server subproject, which allows access 
to the {{HBaseTestingUtility}}) for all the new Alias* prefixed classes.
The most elaborate testing takes place in the {{TestAliasEnabledTable}} module, 
in which identical sets of mutations and queries are submitted against both a 
standard (non-alias-enabled, “baseline”) table and four other tables defined 
with various combinations of alias-enabled column-families. Results from all 
alias-enabled families are exhaustively compared with the “baseline” results to 
assure that all are completely identical.


was (Author: daniel_vimont):
Here are some specifications of what I’ve currently designed and coded for 
column-aliasing (and will soon be submitting as a patch)...


*COLUMN-ALIASING FOR THE END-USER*:
>From the end-user perspective, column-aliasing entails the following two 
>things...


(1) _Environmental configuration to enable aliasing_:
Aliasing makes use of the already-existing {{hbase.client.connection.impl}} 
configuration parameter. The following entry should be added to 
{{hbase-site.xml}}:
{code}
  <property>
    <name>hbase.client.connection.impl</name>
    <value>org.apache.hadoop.hbase.client.AliasEnabledConnection</value>
  </property>
{code}
Setting this parameter to this value results in 
{{ConnectionFactory#createConnection}} returning Connections of the new 
{{AliasEnabledConnection}} class (subclass of the {{ConnectionImplementation}} 
class).


(2) _Alias-enabling individual column families_:
Aliasing is enabled at the column-family level. When adding a column-descriptor 
(i.e. family) to a Table, the new method {{HColumnDescriptor#setAliasSize}} may 
be invoked to immutably set the fixed size (in bytes) of column-qualifier 
aliases for the column family. The default value of 0 (aliasing disabled) may 
be changed to either 1, 2, or 4.


Other than the above, the end-user-application code should neither require nor 
contain any “awareness” whatsoever that column-aliasing is being utilized for a 
column-family. An end-user-application continues to interact only with the 
standard interfaces of the client API ({{Connection}}, {{Table}}, 
{{BufferedMutator}}, and {{HTableMultiplexer}}).


*COLUMN-ALIASING INTERNALS*:
One of the overriding goals in designing the column-aliasing infrastructure is 
to minimize alterations and insertions into already-existing hbase-client code, 
and to have very-close-to-zero impact on already-existing functionality, 
particularly in those situations in which aliasing will NOT be used. The 
following is a comprehensive list of all new and modified modules, along with 
an explanation as to the role that the new or modified module plays in aliasing.


_Modified classes_:
*HColumnDescriptor*: as described above, the new method {{#setAliasSize}} has 
been added; also corresponding methods, {{#getAliasSize}} and 
{{#isAliasEnabled}} (returns “true” if aliasSize not zero).
*HTableDescriptor*: new method {{#hasAliasEnabledFamily}} has been added 
(returns “true” if one or more of the table’s families are aliasEnabled).
(Corresponding modifications were also made to {{TestHColumnDescriptor}} and 
{{TestHTableDescriptor}}, to test the new methods appropriately.)


_New class_:
*AliasEnabledConnection* (subclass of {{ConnectionImplementation}}): provides 
overrides of {{#getTable}} and {{#getBufferedMutator}} to return objects of the 
{{AliasEnabledTable}} class and the {{AliasEnabledBufferedMutator}} class, 
respectively.


_Modified class_:
*HTableMultiplexer*: new static method added -- 
{{#getAliasEnabledTableMultiplexer}}, returns an {{HTableMultiplexer}} object 
that is actually an instance of the new subclass, 
{{AliasEnabledTableMultiplexer}}.


_New classes_:
*AliasEnabledTable* (subclass of {{HTable}})
*AliasEnabledBufferedMutator* (subclass of {{BufferedMutatorImpl}})
*AliasEnabledTableMultiplexer* (subclass of {{HTableMultiplexer}})
-- all the above contain overrides which allow for {{AliasManager}} methods to 
be invoked when needed -- to perform qualifier-to-alias conversions (for 
{{Get}}, {{Scan}}, and {{Mutation}} objects), and alias-to-qualifier 
conversions (for {{Result}} objects) -- for any Table for which 
{{HTableDescriptor#hasAliasEnabledFamily}} is “true”.


_New class_:
*AliasManager*: performs all alias-oriented conversions -- qualifier-to-alias 
conversions for queries and  mutations, and alias-to-qualifier conversions for 
results. It fully encapsulates all CRUD transactions against the 
{{aliasMappingTable}} (the HBase table in which qualifier-to-alias mappings are 
persisted for each alias-enabled column family). When a {{Mutation}} object 
contains a column-qualifier for which an alias entry does not yet exist, a new 
alias is generated and stored in a qualifier-to-alias mapping entry in the 
{{aliasMappingTable}}. The first time an {{AliasManager}} is instantiated 
against an HBase cluster, the {{aliasMappingTable}} will be created if it does 
not already exist.


_New reserved HBase table_:
*aliasMappingTable*: Each row on the {{aliasMappingTable}} corresponds to an 
aliasEnabled column-family on a user-table. The rowId of each 
{{aliasMappingTable}} row is in the format: {{[fully-qualified-user-table-name 
+ ":" + aliasEnabled-Family]}}. On each {{aliasMappingTable}} row, the column 
with an EMPTY_BYTE_ARRAY column-qualifier is reserved for an Increment value 
used to generate new unique alias values within the range stipulated by the 
aliasSize (1, 2, or 4 bytes) of the column-family. All other columns on an 
{{aliasMappingTable}} row are key:value pairings which map a 
user-column-qualifier to its corresponding alias.


_Modified interface and class_:
*Admin*
*HBaseAdmin*
-- new method added, {{#deleteColumnFamilyAliases}}. While usage is not 
mandatory, this method may be invoked to remove from the {{aliasMappingTable}} 
the row associated with a specific column-family. It may only be successfully 
invoked after the column-family has been fully deleted from its table, or after 
the table itself has been deleted.


_Modified class_:
*Get*: new package-protected method {{#setFamilyMap}} -- added to make {{Get}} 
consistent with {{Scan}}, {{Mutation}}, etc. (which already have such a 
method); this was required to allow {{AliasManager}} to cleanly produce 
alias-converted {{Get}} objects.


_Modified class_:
*ConnectionImplementation*: Inadvertent usage of standard {{HTable}} and 
{{BufferedMutatorImpl}} objects against a Table with alias-enabled 
column-families would result in corruption of the Table’s data. To prevent 
this, the methods {{#getTable}} and {{#getBufferedMutator}} were modified with 
the addition of a call to the static method 
{{AliasManager#verifyConnectionForAliasEnabledTable}}, which throws an 
{{IllegalStateException}} if a Table is alias-enabled and the 
{{AliasEnabledConnection.class}} is not assignable from the class of the 
current {{Connection}}.


_Modified classes_:
*ConnectionImplementation*
*HTable*
-- the method {{ConnectionImplementation#getBufferedMutator}} was refactored 
into two separate methods, with the original {{#getBufferedMutator}} method now 
calling a new package-protected method called {{#getBufferedMutatorImpl}}. The 
HTable class internally uses a {{BufferedMutatorImpl}} object to accomplish 
some of its processing, and its invocation of 
{{ConnectionImplementation#getBufferedMutator}} needed to be changed to the new 
{{#getBufferedMutatorImpl}} method in order to assure proper functioning of 
both {{HTable}} and its new subclass, {{AliasEnabledTable}}. (Without this 
change, {{AliasEnabledTable}} would incorrectly instantiate an internal 
{{AliasEnabledBufferedMutator}} instead of the required standard 
{{BufferedMutatorImpl}}.)


_Modified classes_:
*NamespaceDescriptor*: constants added for {{ALIAS_NAMESPACE}}, and the new 
namespace was added to the {{RESERVED_NAMESPACES}} Set.
*TableName*: constant added for {{ALIAS_TABLE_NAME}}.


_Added TEST classes_:
-- Test classes were added (in the hbase-server subproject, which allows access 
to the {{HBaseTestingUtility}}) for all the new Alias* prefixed classes.
The most elaborate testing takes place in the {{TestAliasEnabledTable}} module, 
in which identical sets of mutations and queries are submitted against both a 
standard (non-alias-enabled, “baseline”) table and four other tables defined 
with various combinations of alias-enabled column-families. Results from all 
alias-enabled families are exhaustively compared with the “baseline” results to 
assure that all are completely identical.

> Add column-aliasing capability to hbase-client
> ----------------------------------------------
>
>                 Key: HBASE-17257
>                 URL: https://issues.apache.org/jira/browse/HBASE-17257
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client
>    Affects Versions: 2.0.0
>            Reporter: Daniel Vimont
>            Assignee: Daniel Vimont
>              Labels: features
>         Attachments: HBASE-17257-v2.patch, HBASE-17257-v3.patch, 
> HBASE-17257.patch
>
>
> Review Board link: https://reviews.apache.org/r/54635/
> Column aliasing will provide the option for a 1, 2, or 4 byte alias value to 
> be stored in each cell of an "alias enabled" column-family, in place of the 
> full-length column-qualifier. Aliasing is intended to operate completely 
> invisibly to the end-user developer, with absolutely no "awareness" of 
> aliasing required to be coded into a front-end application. No new public 
> hbase-client interfaces are to be introduced, and only a few new public 
> methods should need to be added to existing interfaces, primarily to allow an 
> administrator to designate that a new column-family is to be alias-enabled by 
> setting its aliasSize attribute to 1, 2, or 4.
> To facilitate such functionality, new subclasses of HTable, 
> BufferedMutatorImpl, and HTableMultiplexer are to be provided. The overriding 
> methods of these new subclasses will invoke methods of the new AliasManager 
> class to facilitate qualifier-to-alias conversions (for user-submitted Gets, 
> Scans, and Mutations) and alias-to-qualifier conversions (for Results 
> returned from HBase) for any Table that has one or more alias-enabled column 
> families. All conversion logic will be encapsulated in the new AliasManager 
> class, and all qualifier-to-alias mappings will be persisted in a new 
> aliasMappingTable in a new, reserved namespace.
> An informal polling of HBase users at HBaseCon East and at the 
> Strata/Hadoop-World conference in Sept. 2016 showed that Column Aliasing 
> could be a popular enhancement to standard HBase functionality, due to the 
> fact that full column-qualifiers are stored in each cell, and reducing this 
> qualifier storage requirement down to 1, 2, or 4 bytes per cell could prove 
> beneficial in terms of reduced storage and bandwidth needs. Aliasing is 
> intended chiefly for column-families which are of the "narrow and tall" 
> variety (i.e., that are designed to use relatively few distinct 
> column-qualifiers throughout a large number of rows, throughout the lifespan 
> of the column-family). A column-family that is set up with an alias-size of 1 
> byte can contain up to 255 unique column-qualifiers; a 2 byte alias-size 
> allows for up to 65,535 unique column-qualifiers; and a 4 byte alias-size 
> allows for up to 4,294,967,295 unique column-qualifiers.
> Fuller specifications will be entered into the comments section below. Note 
> that it may well not be viable to add aliasing support in the new "async" 
> classes that appear to be currently under development.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to