[ 
https://issues.apache.org/jira/browse/TRAFODION-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647043#comment-14647043
 ] 

Hans Zeller commented on TRAFODION-50:
--------------------------------------

Here is a proposal for a new syntax: SPLIT BY. I want to avoid PARTITION BY, 
because I hope we can eventually use that for a much more tightly managed model 
where we take control over region placement, allow co-location of tables in 
region servers, and push things like groupby and join down to the region server 
level. SPLIT BY will be the light version of that, an initial split, but then 
HBase can split further it wants to.

create table lineitem(
l_orderkey int not null,
l_linenum int not null,
...
primary key (l_orderkey, l_linenum))
SPLIT BY (l_orderkey)
(add first key (10000),
add first key (20000)
);

> LP Blueprint: cmp-presplit-unsalted-tables - Add SQL syntax to pre-split 
> unsalted tables into regions
> -----------------------------------------------------------------------------------------------------
>
>                 Key: TRAFODION-50
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-50
>             Project: Apache Trafodion
>          Issue Type: New Feature
>          Components: sql-cmp
>            Reporter: Hans Zeller
>            Priority: Critical
>             Fix For: 2.0-incubating
>
>
> Currently, Trafodion creates tables as a single-region HBase table, unless 
> the table is salted. Salted tables have one region per salt bucket. We would 
> like to add SQL syntax to allow pre-splitting of unsalted tables as well. We 
> would like to offer two new table options, SPLIT BY and PARTITION BY. Both 
> allow specification of split keys. SPLIT BY will simply pre-split the table, 
> with no special split policy. PARTITION BY will add a prefix split policy 
> that ensures that all rows with common PARTITION BY column values will remain 
> in the same table.
> Example:
> create table lineitem(
>    l_orderkey int not null,
>    l_linenum int not null,
>    ...
>    primary key (l_orderkey, l_linenum))
> PARTITION BY (l_orderkey)
> (add first key (10000),
>  add first key (20000)
> );
> This will create three regions containing for values [<min>, 10000), [10000, 
> 20000) and [20000, <max>) it will add a prefix split policy for a 4 byte 
> prefix, the length of a non-nullable INT column. HBase will still be able to 
> split the regions further, but we will have a guarantee that all rows for a 
> given l_orderkey are located in the same region.
> If SPLIT BY is used instead of PARTITION BY, the same regions will be 
> created, but no custom split policy will be added. Note that the PARTITION BY 
> / SPLIT BY column(s) have to be a prefix of the clustering key of the table. 
> PARTITION BY and SPLIT BY are not allowed for salted tables. We may or may 
> not support pre-splitting of divisioned tables in the first release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to