[
https://issues.apache.org/jira/browse/HBASE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15728144#comment-15728144
]
Daniel Vimont commented on HBASE-17257:
---------------------------------------
Here are some specifications of what I’ve currently designed and coded for
column-aliasing (and will soon be submitting as a patch)...
*COLUMN-ALIASING FOR THE END-USER*:
>From the end-user perspective, column-aliasing entails the following two
>things...
(1) _Environmental configuration to enable aliasing_:
Aliasing makes use of the already-existing {{hbase.client.connection.impl}}
configuration parameter. The following entry should be added to
{{hbase-site.xml}}:
{code}
<property>
<name>hbase.client.connection.impl</name>
<value>org.apache.hadoop.hbase.client.AliasEnabledConnection</value>
</property>
{code}
Setting this parameter to this value results in
{{ConnectionFactory#createConnection}} returning Connections of the new
{{AliasEnabledConnection}} class (subclass of the {{ConnectionImplementation}}
class).
(2) _Alias-enabling individual column families_:
Aliasing is enabled at the column-family level. When adding a column-descriptor
(i.e. family) to a Table, the new method {{HColumnDescriptor#setAliasSize}} may
be invoked to immutably set the fixed size (in bytes) of column-qualifier
aliases for the column family. The default value of 0 (aliasing disabled) may
be changed to either 1, 2, or 4.
Other than the above, the end-user-application code should neither require nor
contain any “awareness” whatsoever that column-aliasing is being utilized for a
column-family. An end-user-application continues to interact only with the
standard interfaces of the client API ({{Connection}}, {{Table}},
{{BufferedMutator}}, and {{HTableMultiplexer}}).
*COLUMN-ALIASING INTERNALS*:
One of the overriding goals in designing the column-aliasing infrastructure is
to minimize alterations and insertions into already-existing hbase-client code,
and to have very-close-to-zero impact on already-existing functionality,
particularly in those situations in which aliasing will NOT be used. The
following is a comprehensive list of all new and modified modules, along with
an explanation as to the role that the new or modified module plays in aliasing.
_Modified classes_:
*HColumnDescriptor*: as described above, the new method {{#setAliasSize}} has
been added; also corresponding methods, {{#getAliasSize}} and
{{#isAliasEnabled}} (returns “true” if aliasSize not zero).
*HTableDescriptor*: new method {{#hasAliasEnabledFamily}} has been added
(returns “true” if one or more of the table’s families are aliasEnabled).
(Corresponding modifications were also made to {{TestHColumnDescriptor}} and
{{TestHTableDescriptor}}, to test the new methods appropriately.)
_New class_:
*AliasEnabledConnection* (subclass of {{ConnectionImplementation}}): provides
overrides of {{#getTable}} and {{#getBufferedMutator}} to return objects of the
{{AliasEnabledTable}} class and the {{AliasEnabledBufferedMutator}} class,
respectively.
_Modified class_:
*HTableMultiplexer*: new static method added --
{{#getAliasEnabledTableMultiplexer}}, returns an {{HTableMultiplexer}} object
that is actually an instance of the new subclass,
{{AliasEnabledTableMultiplexer}}.
_New classes_:
*AliasEnabledTable* (subclass of {{HTable}})
*AliasEnabledBufferedMutator* (subclass of {{BufferedMutatorImpl}})
*AliasEnabledTableMultiplexer* (subclass of {{HTableMultiplexer}})
-- all the above contain overrides which allow for {{AliasManager}} methods to
be invoked when needed -- to perform qualifier-to-alias conversions (for
{{Get}}, {{Scan}}, and {{Mutation}} objects), and alias-to-qualifier
conversions (for {{Result}} objects) -- for any Table for which
{{HTableDescriptor#hasAliasEnabledFamily}} is “true”.
_New class_:
*AliasManager*: performs all alias-oriented conversions -- qualifier-to-alias
conversions for queries and mutations, and alias-to-qualifier conversions for
results. It fully encapsulates all CRUD transactions against the
{{aliasMappingTable}} (the HBase table in which qualifier-to-alias mappings are
persisted for each alias-enabled column family). When a {{Mutation}} object
contains a column-qualifier for which an alias entry does not yet exist, a new
alias is generated and stored in a qualifier-to-alias mapping entry in the
{{aliasMappingTable}}. The first time an {{AliasManager}} is instantiated
against an HBase cluster, the {{aliasMappingTable}} will be created if it does
not already exist.
_New reserved HBase table_:
*aliasMappingTable*: Each row on the {{aliasMappingTable}} corresponds to an
aliasEnabled column-family on a user-table. The rowId of each
{{aliasMappingTable}} row is in the format: {{[fully-qualified-user-table-name
+ ":" + aliasEnabled-Family]}}. On each {{aliasMappingTable}} row, the column
with an EMPTY_BYTE_ARRAY column-qualifier is reserved for an Increment value
used to generate new unique alias values within the range stipulated by the
aliasSize (1, 2, or 4 bytes) of the column-family. All other columns on an
{{aliasMappingTable}} row are key:value pairings which map a
user-column-qualifier to its corresponding alias.
_Modified interface and class_:
*Admin*
*HBaseAdmin*
-- new method added, {{#deleteColumnFamilyAliases}}. While usage is not
mandatory, this method may be invoked to remove from the {{aliasMappingTable}}
the row associated with a specific column-family. It may only be successfully
invoked after the column-family has been fully deleted from its table, or after
the table itself has been deleted.
_Modified class_:
*Get*: new package-protected method {{#setFamilyMap}} -- added to make {{Get}}
consistent with {{Scan}}, {{Mutation}}, etc. (which already have such a
method); this was required to allow {{AliasManager}} to cleanly produce
alias-converted {{Get}} objects.
_Modified class_:
*ConnectionImplementation*: Inadvertent usage of standard {{HTable}} and
{{BufferedMutatorImpl}} objects against a Table with alias-enabled
column-families would result in corruption of the Table’s data. To prevent
this, the methods {{#getTable}} and {{#getBufferedMutator}} were modified with
the addition of a call to the static method
{{AliasManager#verifyConnectionForAliasEnabledTable}}, which throws an
{{IllegalStateException}} if a Table is alias-enabled and the
{{AliasEnabledConnection.class}} is not assignable from the class of the
current {{Connection}}.
_Modified classes_:
*ConnectionImplementation*
*HTable*
-- the method {{ConnectionImplementation#getBufferedMutator}} was refactored
into two separate methods, with the original {{#getBufferedMutator}} method now
calling a new package-protected method called {{#getBufferedMutatorImpl}}. The
HTable class internally uses a {{BufferedMutatorImpl}} object to accomplish
some of its processing, and its invocation of
{{ConnectionImplementation#getBufferedMutator}} needed to be changed to the new
{{#getBufferedMutatorImpl}} method in order to assure proper functioning of
both {{HTable}} and its new subclass, {{AliasEnabledTable}}. (Without this
change, {{AliasEnabledTable}} would incorrectly instantiate an internal
{{AliasEnabledBufferedMutator}} instead of the required standard
{{BufferedMutatorImpl}}.)
_Modified classes_:
*NamespaceDescriptor*: constants added for {{ALIAS_NAMESPACE}}, and the new
namespace was added to the {{RESERVED_NAMESPACES}} Set.
*TableName*: constant added for {{ALIAS_TABLE_NAME}}.
_Added TEST classes_:
-- Test classes were added (in the hbase-server subproject, which allows access
to the {{HBaseTestingUtility}}) for all the new Alias* prefixed classes.
The most elaborate testing takes place in the {{TestAliasEnabledTable}} module,
in which identical sets of mutations and queries are submitted against both a
standard (non-alias-enabled, “baseline”) table and four other tables defined
with various combinations of alias-enabled column-families. Results from all
alias-enabled families are exhaustively compared with the “baseline” results to
assure that all are completely identical.
> Add column-aliasing capability to hbase-client
> ----------------------------------------------
>
> Key: HBASE-17257
> URL: https://issues.apache.org/jira/browse/HBASE-17257
> Project: HBase
> Issue Type: New Feature
> Components: Client
> Affects Versions: 2.0.0
> Reporter: Daniel Vimont
> Assignee: Daniel Vimont
> Labels: features
>
> Column aliasing will provide the option for a 1, 2, or 4 byte alias value to
> be stored in each cell of an "alias enabled" column-family, in place of the
> full-length column-qualifier. Aliasing is intended to operate completely
> invisibly to the end-user developer, with absolutely no "awareness" of
> aliasing required to be coded into a front-end application. No new public
> hbase-client interfaces are to be introduced, and only a few new public
> methods should need to be added to existing interfaces, primarily to allow an
> administrator to designate that a new column-family is to be alias-enabled by
> setting its aliasSize attribute to 1, 2, or 4.
> To facilitate such functionality, new subclasses of HTable,
> BufferedMutatorImpl, and HTableMultiplexer are to be provided. The overriding
> methods of these new subclasses will invoke methods of the new AliasManager
> class to facilitate qualifier-to-alias conversions (for user-submitted Gets,
> Scans, and Mutations) and alias-to-qualifier conversions (for Results
> returned from HBase) for any Table that has one or more alias-enabled column
> families. All conversion logic will be encapsulated in the new AliasManager
> class, and all qualifier-to-alias mappings will be persisted in a new
> aliasMappingTable in a new, reserved namespace.
> An informal polling of HBase users at HBaseCon East and at the
> Strata/Hadoop-World conference in Sept. 2016 showed that Column Aliasing
> could be a popular enhancement to standard HBase functionality, due to the
> fact that full column-qualifiers are stored in each cell, and reducing this
> qualifier storage requirement down to 1, 2, or 4 bytes per cell could prove
> beneficial in terms of reduced storage and bandwidth needs. Aliasing is
> intended chiefly for column-families which are of the "narrow and tall"
> variety (i.e., that are designed to use relatively few distinct
> column-qualifiers throughout a large number of rows, throughout the lifespan
> of the column-family). A column-family that is set up with an alias-size of 1
> byte can contain up to 255 unique column-qualifiers; a 2 byte alias-size
> allows for up to 65,535 unique column-qualifiers; and a 4 byte alias-size
> allows for up to 4,294,967,295 unique column-qualifiers.
> Fuller specifications will be entered into the comments section below. Note
> that it may well not be viable to add aliasing support in the new "async"
> classes that appear to be currently under development.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)