[
https://issues.apache.org/jira/browse/TRAFODION-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642742#comment-14642742
]
Rohit Jain commented on TRAFODION-1419:
---------------------------------------
Anoop,
Just to clarify a few things:
- The same set of HBase options will apply to both column families. That is,
you cannot vary the HBase options between column families. By extension this
goes to aligned format syntax as well, although in the future we may remove
that limitation.
- Salting and Divisioning will apply to all column families
- An index on the table can only reference columns in the default column family
and not in any of the other column families you might create
- DML does not reference a column family. So views and other objects on the
table will be oblivious of the CF a column resides in, implying implicit joins
between CFs to retrieve results (handled at the HBase level?).
- Predicate push down against multiple column families will treat this like a
join amongst tables and access the CF resulting in the lowest cardinality being
accessed first and then joined with the rest of the CFs in cardinality based
join order. Or is this all handled at the HBase level - if so, I wonder how
that works, especially with no cardinality information (I am assuming that CFs
are scanned only if and when they are needed).
Rohit
> Add support for multiple column families in a trafodion table
> -------------------------------------------------------------
>
> Key: TRAFODION-1419
> URL: https://issues.apache.org/jira/browse/TRAFODION-1419
> Project: Apache Trafodion
> Issue Type: New Feature
> Reporter: Anoop Sharma
> Assignee: Anoop Sharma
>
> This proposal is to add support for multiple column families in trafodion
> tables. With this feature, one can store columns into multiple column
> families. One use for this would be to store frequently used columns in one
> column family and infrequently used columns to be stored in a different
> column family. That will have performance improvement when those columns are
> retrieved from hbase. There could be other uses as well.
> Syntax:
> create table <tablename> ( <colFam1>.<colName1> <datatype>,
> <colFam2>.<colName2> <datatype> ….)
> attributes default column family <colFam>;
> alter table <tablename> add column <colFam>.<colName> datatype;
> <colFam> : name of column family for that column
> Semantics:
> <colFam> name follows identifier rules. If not double quoted, then it
> will be upper cased. If double quoted, then case will be maintained.
> User specified column family can be of arbitrary length. To optimize
> space for column family stored in a cell, a 2 byte encoding is generated.
> Mapping of user specified column family to encoded column family is stored in
> metadata.
> If no column family is specified for a column during create table, then
> the family specified in ‘attributes default column family’ clause is used.
> If no ‘attribute default column family’ clause is specified , then system
> default col family is used.
> column family specification is supported for regular and volatile
> tables.
> all unique column families specified during create or alter are added
> to the table
> maximum number of column families supported in one table is 32. But it
> is hbase recommendation to not create too many column families.
> alter statement can be used to assign specific hbase options to
> specific column families
> using the NAME clause. If no name clause is specified, then alter hbase
> options are applied
> to all col families.
> invoke and showddl statements will show the original user specified
> column families and not the encoded column families
> Currently, multiple column families are not supported for columns of a
> user created or an implicitly created index.
> The default column family of the corresponding base table is used for all
> index columns.
> column family cannot be specified in a DML query
> column family cannot be specified for columns of an aligned row format
> table since all columns are stored as one cell
> Column names must be unique for each table. The same column name cannot
> be used as part of multiple column families.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)