[ https://issues.apache.org/jira/browse/HBASE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15728144#comment-15728144 ]
Daniel Vimont edited comment on HBASE-17257 at 12/15/16 5:24 AM: ----------------------------------------------------------------- Here are some specifications of what I’ve currently designed and coded for column-aliasing (and will soon be submitting as a patch)... *COLUMN-ALIASING FOR THE END-USER*: >From the end-user perspective, column-aliasing entails the following two >things... (1) _Environmental configuration to enable aliasing_: Aliasing makes use of the already-existing {{hbase.client.connection.impl}} configuration parameter. The following entry should be added to {{hbase-site.xml}}: {code} <property> <name>hbase.client.connection.impl</name> <value>org.apache.hadoop.hbase.client.AliasEnabledConnection</value> </property> {code} Setting this parameter to this value results in {{ConnectionFactory#createConnection}} returning Connections of the new {{AliasEnabledConnection}} class (subclass of the {{ConnectionImplementation}} class). (2) _Alias-enabling individual column families_: Aliasing is enabled at the column-family level. When adding a column-descriptor (i.e. family) to a Table, the new method {{HColumnDescriptor#setAliasSize}} may be invoked to immutably set the fixed size (in bytes) of column-qualifier aliases for the column family. The default value of 0 (aliasing disabled) may be changed to either 1, 2, or 4. Other than the above, the end-user-application code should neither require nor contain any “awareness” whatsoever that column-aliasing is being utilized for a column-family. An end-user-application continues to interact only with the standard interfaces of the client API ({{Connection}}, {{Table}}, {{BufferedMutator}}, and {{HTableMultiplexer}}). *COLUMN-ALIASING INTERNALS*: One of the overriding goals in designing the column-aliasing infrastructure is to minimize alterations and insertions into already-existing hbase-client code, and to have very-close-to-zero impact on already-existing functionality, particularly in those situations in which aliasing will NOT be used. The following is a comprehensive list of all new and modified modules, along with an explanation as to the role that the new or modified module plays in aliasing. _Modified classes_: *HColumnDescriptor*: as described above, the new method {{#setAliasSize}} has been added; also corresponding methods, {{#getAliasSize}} and {{#isAliasEnabled}} (returns “true” if aliasSize not zero). *HTableDescriptor*: new method {{#hasAliasEnabledFamily}} has been added (returns “true” if one or more of the table’s families are aliasEnabled). (Corresponding modifications were also made to {{TestHColumnDescriptor}} and {{TestHTableDescriptor}}, to test the new methods appropriately.) _New class_: *AliasEnabledConnection* (subclass of {{ConnectionImplementation}}): provides overrides of {{#getTable}} and {{#getBufferedMutator}} to return objects of the {{AliasEnabledTable}} class and the {{AliasEnabledBufferedMutator}} class, respectively. _Modified class_: *HTableMultiplexer*: new static method added -- {{#getAliasEnabledTableMultiplexer}}, returns an {{HTableMultiplexer}} object that is actually an instance of the new subclass, {{AliasEnabledTableMultiplexer}}. _New classes_: *AliasEnabledTable* (subclass of {{HTable}}) *AliasEnabledBufferedMutator* (subclass of {{BufferedMutatorImpl}}) *AliasEnabledTableMultiplexer* (subclass of {{HTableMultiplexer}}) -- all the above contain overrides which allow for {{AliasManager}} methods to be invoked when needed -- to perform qualifier-to-alias conversions (for {{Get}}, {{Scan}}, and {{Mutation}} objects), and alias-to-qualifier conversions (for {{Result}} objects) -- for any Table for which {{HTableDescriptor#hasAliasEnabledFamily}} is “true”. _New class_: *AliasManager*: performs all alias-oriented conversions -- qualifier-to-alias conversions for queries and mutations, and alias-to-qualifier conversions for results. It fully encapsulates all CRUD transactions against the {{aliasMappingTable}} (the HBase table in which qualifier-to-alias mappings are persisted for each alias-enabled column family). When a {{Mutation}} object contains a column-qualifier for which an alias entry does not yet exist, a new alias is generated and stored in a qualifier-to-alias mapping entry in the {{aliasMappingTable}}. The first time an {{AliasManager}} is instantiated against an HBase cluster, the {{aliasMappingTable}} will be created if it does not already exist. _New reserved HBase table_: *aliasMappingTable*: Each row on the {{aliasMappingTable}} corresponds to an aliasEnabled column-family on a user-table. The rowId of each {{aliasMappingTable}} row is in the format: {{[fully-qualified-user-table-name + ":" + aliasEnabled-Family]}}. On each {{aliasMappingTable}} row, the column with an EMPTY_BYTE_ARRAY column-qualifier is reserved for an Increment value used to generate new unique alias values within the range stipulated by the aliasSize (1, 2, or 4 bytes) of the column-family. All other columns on an {{aliasMappingTable}} row are key:value pairings which map a user-column-qualifier to its corresponding alias. _Modified interface and class_: *Admin* *HBaseAdmin* -- new method added, {{#deleteColumnFamilyAliases}}. While usage is not mandatory, this method may be invoked to remove from the {{aliasMappingTable}} the row associated with a specific column-family. It may only be successfully invoked after the column-family has been fully deleted from its table, or after the table itself has been deleted. _Modified class_: *Get*: new package-protected method {{#setFamilyMap}} -- added to make {{Get}} consistent with {{Scan}}, {{Mutation}}, etc. (which already have such a method); this was required to allow {{AliasManager}} to cleanly produce alias-converted {{Get}} objects. _Modified class_: *ConnectionImplementation*: Inadvertent usage of standard {{HTable}} and {{BufferedMutatorImpl}} objects against a Table with alias-enabled column-families would result in corruption of the Table’s data. To prevent this, the methods {{#getTable}} and {{#getBufferedMutator}} were modified with the addition of a call to the static method {{AliasManager#verifyConnectionForAliasEnabledTable}}, which throws an {{IllegalStateException}} if a Table is alias-enabled and the {{AliasEnabledConnection.class}} is not assignable from the class of the current {{Connection}}. _Modified classes_: *ConnectionImplementation* *HTable* -- the method {{ConnectionImplementation#getBufferedMutator}} was refactored into two separate methods, with the original {{#getBufferedMutator}} method now calling a new package-protected method called {{#getBufferedMutatorImpl}}. The HTable class internally uses a {{BufferedMutatorImpl}} object to accomplish some of its processing, and its invocation of {{ConnectionImplementation#getBufferedMutator}} needed to be changed to the new {{#getBufferedMutatorImpl}} method in order to assure proper functioning of both {{HTable}} and its new subclass, {{AliasEnabledTable}}. (Without this change, {{AliasEnabledTable}} would incorrectly instantiate an internal {{AliasEnabledBufferedMutator}} instead of the required standard {{BufferedMutatorImpl}}.) _Modified classes_: *TableName*: constant added for {{ALIAS_TABLE_NAME}}. _Added TEST classes_: -- Test classes were added (in the hbase-server subproject, which allows access to the {{HBaseTestingUtility}}) for all the new Alias* prefixed classes. The most elaborate testing takes place in the {{TestAliasEnabledTable}} module, in which identical sets of mutations and queries are submitted against both a standard (non-alias-enabled, “baseline”) table and four other tables defined with various combinations of alias-enabled column-families. Results from all alias-enabled families are exhaustively compared with the “baseline” results to assure that all are completely identical. was (Author: daniel_vimont): Here are some specifications of what I’ve currently designed and coded for column-aliasing (and will soon be submitting as a patch)... *COLUMN-ALIASING FOR THE END-USER*: >From the end-user perspective, column-aliasing entails the following two >things... (1) _Environmental configuration to enable aliasing_: Aliasing makes use of the already-existing {{hbase.client.connection.impl}} configuration parameter. The following entry should be added to {{hbase-site.xml}}: {code} <property> <name>hbase.client.connection.impl</name> <value>org.apache.hadoop.hbase.client.AliasEnabledConnection</value> </property> {code} Setting this parameter to this value results in {{ConnectionFactory#createConnection}} returning Connections of the new {{AliasEnabledConnection}} class (subclass of the {{ConnectionImplementation}} class). (2) _Alias-enabling individual column families_: Aliasing is enabled at the column-family level. When adding a column-descriptor (i.e. family) to a Table, the new method {{HColumnDescriptor#setAliasSize}} may be invoked to immutably set the fixed size (in bytes) of column-qualifier aliases for the column family. The default value of 0 (aliasing disabled) may be changed to either 1, 2, or 4. Other than the above, the end-user-application code should neither require nor contain any “awareness” whatsoever that column-aliasing is being utilized for a column-family. An end-user-application continues to interact only with the standard interfaces of the client API ({{Connection}}, {{Table}}, {{BufferedMutator}}, and {{HTableMultiplexer}}). *COLUMN-ALIASING INTERNALS*: One of the overriding goals in designing the column-aliasing infrastructure is to minimize alterations and insertions into already-existing hbase-client code, and to have very-close-to-zero impact on already-existing functionality, particularly in those situations in which aliasing will NOT be used. The following is a comprehensive list of all new and modified modules, along with an explanation as to the role that the new or modified module plays in aliasing. _Modified classes_: *HColumnDescriptor*: as described above, the new method {{#setAliasSize}} has been added; also corresponding methods, {{#getAliasSize}} and {{#isAliasEnabled}} (returns “true” if aliasSize not zero). *HTableDescriptor*: new method {{#hasAliasEnabledFamily}} has been added (returns “true” if one or more of the table’s families are aliasEnabled). (Corresponding modifications were also made to {{TestHColumnDescriptor}} and {{TestHTableDescriptor}}, to test the new methods appropriately.) _New class_: *AliasEnabledConnection* (subclass of {{ConnectionImplementation}}): provides overrides of {{#getTable}} and {{#getBufferedMutator}} to return objects of the {{AliasEnabledTable}} class and the {{AliasEnabledBufferedMutator}} class, respectively. _Modified class_: *HTableMultiplexer*: new static method added -- {{#getAliasEnabledTableMultiplexer}}, returns an {{HTableMultiplexer}} object that is actually an instance of the new subclass, {{AliasEnabledTableMultiplexer}}. _New classes_: *AliasEnabledTable* (subclass of {{HTable}}) *AliasEnabledBufferedMutator* (subclass of {{BufferedMutatorImpl}}) *AliasEnabledTableMultiplexer* (subclass of {{HTableMultiplexer}}) -- all the above contain overrides which allow for {{AliasManager}} methods to be invoked when needed -- to perform qualifier-to-alias conversions (for {{Get}}, {{Scan}}, and {{Mutation}} objects), and alias-to-qualifier conversions (for {{Result}} objects) -- for any Table for which {{HTableDescriptor#hasAliasEnabledFamily}} is “true”. _New class_: *AliasManager*: performs all alias-oriented conversions -- qualifier-to-alias conversions for queries and mutations, and alias-to-qualifier conversions for results. It fully encapsulates all CRUD transactions against the {{aliasMappingTable}} (the HBase table in which qualifier-to-alias mappings are persisted for each alias-enabled column family). When a {{Mutation}} object contains a column-qualifier for which an alias entry does not yet exist, a new alias is generated and stored in a qualifier-to-alias mapping entry in the {{aliasMappingTable}}. The first time an {{AliasManager}} is instantiated against an HBase cluster, the {{aliasMappingTable}} will be created if it does not already exist. _New reserved HBase table_: *aliasMappingTable*: Each row on the {{aliasMappingTable}} corresponds to an aliasEnabled column-family on a user-table. The rowId of each {{aliasMappingTable}} row is in the format: {{[fully-qualified-user-table-name + ":" + aliasEnabled-Family]}}. On each {{aliasMappingTable}} row, the column with an EMPTY_BYTE_ARRAY column-qualifier is reserved for an Increment value used to generate new unique alias values within the range stipulated by the aliasSize (1, 2, or 4 bytes) of the column-family. All other columns on an {{aliasMappingTable}} row are key:value pairings which map a user-column-qualifier to its corresponding alias. _Modified interface and class_: *Admin* *HBaseAdmin* -- new method added, {{#deleteColumnFamilyAliases}}. While usage is not mandatory, this method may be invoked to remove from the {{aliasMappingTable}} the row associated with a specific column-family. It may only be successfully invoked after the column-family has been fully deleted from its table, or after the table itself has been deleted. _Modified class_: *Get*: new package-protected method {{#setFamilyMap}} -- added to make {{Get}} consistent with {{Scan}}, {{Mutation}}, etc. (which already have such a method); this was required to allow {{AliasManager}} to cleanly produce alias-converted {{Get}} objects. _Modified class_: *ConnectionImplementation*: Inadvertent usage of standard {{HTable}} and {{BufferedMutatorImpl}} objects against a Table with alias-enabled column-families would result in corruption of the Table’s data. To prevent this, the methods {{#getTable}} and {{#getBufferedMutator}} were modified with the addition of a call to the static method {{AliasManager#verifyConnectionForAliasEnabledTable}}, which throws an {{IllegalStateException}} if a Table is alias-enabled and the {{AliasEnabledConnection.class}} is not assignable from the class of the current {{Connection}}. _Modified classes_: *ConnectionImplementation* *HTable* -- the method {{ConnectionImplementation#getBufferedMutator}} was refactored into two separate methods, with the original {{#getBufferedMutator}} method now calling a new package-protected method called {{#getBufferedMutatorImpl}}. The HTable class internally uses a {{BufferedMutatorImpl}} object to accomplish some of its processing, and its invocation of {{ConnectionImplementation#getBufferedMutator}} needed to be changed to the new {{#getBufferedMutatorImpl}} method in order to assure proper functioning of both {{HTable}} and its new subclass, {{AliasEnabledTable}}. (Without this change, {{AliasEnabledTable}} would incorrectly instantiate an internal {{AliasEnabledBufferedMutator}} instead of the required standard {{BufferedMutatorImpl}}.) _Modified classes_: *NamespaceDescriptor*: constants added for {{ALIAS_NAMESPACE}}, and the new namespace was added to the {{RESERVED_NAMESPACES}} Set. *TableName*: constant added for {{ALIAS_TABLE_NAME}}. _Added TEST classes_: -- Test classes were added (in the hbase-server subproject, which allows access to the {{HBaseTestingUtility}}) for all the new Alias* prefixed classes. The most elaborate testing takes place in the {{TestAliasEnabledTable}} module, in which identical sets of mutations and queries are submitted against both a standard (non-alias-enabled, “baseline”) table and four other tables defined with various combinations of alias-enabled column-families. Results from all alias-enabled families are exhaustively compared with the “baseline” results to assure that all are completely identical. > Add column-aliasing capability to hbase-client > ---------------------------------------------- > > Key: HBASE-17257 > URL: https://issues.apache.org/jira/browse/HBASE-17257 > Project: HBase > Issue Type: New Feature > Components: Client > Affects Versions: 2.0.0 > Reporter: Daniel Vimont > Assignee: Daniel Vimont > Labels: features > Attachments: HBASE-17257-v2.patch, HBASE-17257-v3.patch, > HBASE-17257.patch > > > Review Board link: https://reviews.apache.org/r/54635/ > Column aliasing will provide the option for a 1, 2, or 4 byte alias value to > be stored in each cell of an "alias enabled" column-family, in place of the > full-length column-qualifier. Aliasing is intended to operate completely > invisibly to the end-user developer, with absolutely no "awareness" of > aliasing required to be coded into a front-end application. No new public > hbase-client interfaces are to be introduced, and only a few new public > methods should need to be added to existing interfaces, primarily to allow an > administrator to designate that a new column-family is to be alias-enabled by > setting its aliasSize attribute to 1, 2, or 4. > To facilitate such functionality, new subclasses of HTable, > BufferedMutatorImpl, and HTableMultiplexer are to be provided. The overriding > methods of these new subclasses will invoke methods of the new AliasManager > class to facilitate qualifier-to-alias conversions (for user-submitted Gets, > Scans, and Mutations) and alias-to-qualifier conversions (for Results > returned from HBase) for any Table that has one or more alias-enabled column > families. All conversion logic will be encapsulated in the new AliasManager > class, and all qualifier-to-alias mappings will be persisted in a new > aliasMappingTable in a new, reserved namespace. > An informal polling of HBase users at HBaseCon East and at the > Strata/Hadoop-World conference in Sept. 2016 showed that Column Aliasing > could be a popular enhancement to standard HBase functionality, due to the > fact that full column-qualifiers are stored in each cell, and reducing this > qualifier storage requirement down to 1, 2, or 4 bytes per cell could prove > beneficial in terms of reduced storage and bandwidth needs. Aliasing is > intended chiefly for column-families which are of the "narrow and tall" > variety (i.e., that are designed to use relatively few distinct > column-qualifiers throughout a large number of rows, throughout the lifespan > of the column-family). A column-family that is set up with an alias-size of 1 > byte can contain up to 255 unique column-qualifiers; a 2 byte alias-size > allows for up to 65,535 unique column-qualifiers; and a 4 byte alias-size > allows for up to 4,294,967,295 unique column-qualifiers. > Fuller specifications will be entered into the comments section below. Note > that it may well not be viable to add aliasing support in the new "async" > classes that appear to be currently under development. -- This message was sent by Atlassian JIRA (v6.3.4#6332)