[ 
https://issues.apache.org/jira/browse/TRAFODION-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642739#comment-14642739
 ] 

Qifan Chen commented on TRAFODION-1419:
---------------------------------------

May need to cover these statements as well. 

1. CREATE TABLE LIKE: columns are grouped into families the same way as in the 
SOURCE table
2. UPDATE STATS, like DMLs, do not need to reference column families. 
3. ALTER TABLE: do we allow a column to move from one family to another?
4. KEY COLUMNS can not be split among different column families

On Syntax to specify a column family,  I wonder if we can use hints to group a 
group of columns for a single family without repeating the family name per 
column, something like the following, where C1 and C2 are in family A and C3 
and C4 in family B. 

create table T (

<<+ family A >>
C1 INT,
C2 INT,

<<+ family B >>
C3 char(10)
C4 varchar(20)
)

In addition, prefixing the column name with a family name makes it look much 
non-ANSI. Should we instead put the family definition to the attribute section 
if the user prefers that way, such as

create table T
(
 C1 int default null family abc,
 C2 int default null family xyz,
)

Lastly, we probably should state that columns in a family will be stored 
together, possibly in one set of HFILEs. Columns from two different families 
will be stored separately. 


> Add support for multiple column families in a trafodion table
> -------------------------------------------------------------
>
>                 Key: TRAFODION-1419
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-1419
>             Project: Apache Trafodion
>          Issue Type: New Feature
>            Reporter: Anoop Sharma
>            Assignee: Anoop Sharma
>
> This proposal is to add support for multiple column families in trafodion 
> tables. With this feature, one can store columns into multiple column 
> families. One use for this would be to store frequently used columns in one 
> column family and infrequently used columns to be stored in a different 
> column family. That will have performance improvement when those columns are 
> retrieved from hbase. There could be other uses as well.
> Syntax:
> create table <tablename> ( <colFam1>.<colName1>  <datatype>, 
> <colFam2>.<colName2> <datatype> ….)
>   attributes default column family <colFam>;
> alter table <tablename> add column <colFam>.<colName> datatype;
> <colFam>  :  name of column family for that column
> Semantics:
>      <colFam> name follows identifier rules. If  not double quoted, then it 
> will be upper cased. If double quoted, then case will be maintained.
>      User specified column family can be of arbitrary length. To optimize 
> space for column family stored in a cell, a 2 byte encoding is generated. 
> Mapping of user specified column family to encoded column family is stored in 
> metadata.
>      If no column family is specified for a column during create table, then 
> the family specified in ‘attributes default column family’ clause is used. 
> If no ‘attribute default column family’ clause is specified , then system 
> default col family is used.
>      column family specification is supported for regular and volatile 
> tables. 
>      all unique column families specified during create or alter are added 
> to the table 
>      maximum number of column families supported in one table is 32. But it 
> is hbase recommendation to not create too many column families. 
>      alter statement can be used to assign specific hbase options to 
> specific column families
> using the NAME clause. If no name clause is specified, then alter hbase  
> options are applied
> to all col families.
>      invoke and showddl statements will show the original user specified 
> column families and not the encoded column families
>      Currently, multiple column families are not supported for columns of a 
> user created or an implicitly created index. 
> The default column family of the corresponding base table is used for all 
> index columns.
>      column family cannot be specified in a DML query
>      column family cannot be specified for columns of an aligned row format 
> table since all columns are stored as one cell
>      Column names must be unique for each table. The same column name cannot 
> be used as part of multiple column families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to