Daniel Vimont created HBASE-17257:
-------------------------------------

             Summary: Add column-aliasing capability to hbase-client
                 Key: HBASE-17257
                 URL: https://issues.apache.org/jira/browse/HBASE-17257
             Project: HBase
          Issue Type: New Feature
          Components: Client
    Affects Versions: 2.0.0
            Reporter: Daniel Vimont
            Assignee: Daniel Vimont


Column aliasing will provide the option for a 1, 2, or 4 byte alias value to be 
stored in each cell of an "alias enabled" column-family, in place of the 
full-length column-qualifier. Aliasing is intended to operate completely 
invisibly to the end-user developer, with absolutely no "awareness" of aliasing 
required to be coded into a front-end application. No new public hbase-client 
interfaces are to be introduced, and only a few new public methods should need 
to be added to existing interfaces, primarily to allow an administrator to 
designate that a new column-family is to be alias-enabled by setting its 
aliasSize attribute to 1, 2, or 4.

To facilitate such functionality, new subclasses of HTable, 
BufferedMutatorImpl, and HTableMultiplexer are to be provided. The overriding 
methods of these new subclasses will invoke methods of the new AliasManager 
class to facilitate qualifier-to-alias conversions (for user-submitted Gets, 
Scans, and Mutations) and alias-to-qualifier conversions (for Results returned 
from HBase) for any Table that has one or more alias-enabled column families. 
All conversion logic will be encapsulated in the new AliasManager class, and 
all qualifier-to-alias mappings will be persisted in a new aliasMappingTable in 
a new, reserved namespace.

An informal polling of HBase users at HBaseCon East and at the 
Strata/Hadoop-World conference in Sept. 2016 showed that Column Aliasing could 
be a popular enhancement to standard HBase functionality, due to the fact that 
full column-qualifiers are stored in each cell, and reducing this qualifier 
storage requirement down to 1, 2, or 4 bytes per cell could prove beneficial in 
terms of reduced storage and bandwidth needs. Aliasing is intended chiefly for 
column-families which are of the "narrow and tall" variety (i.e., that are 
designed to use relatively few distinct column-qualifiers throughout a large 
number of rows, throughout the lifespan of the column-family). A column-family 
that is set up with an alias-size of 1 byte can contain up to 255 unique 
column-qualifiers; a 2 byte alias-size allows for up to 65,535 unique 
column-qualifiers; and a 4 byte alias-size allows for up to 4,294,967,295 
unique column-qualifiers.

Fuller specifications will be entered into the comments section below. Note 
that it may well not be viable to add aliasing support in the new "async" 
classes that appear to be currently under development.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to