[
https://issues.apache.org/jira/browse/CARBONDATA-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jacky Li resolved CARBONDATA-1438.
----------------------------------
Resolution: Fixed
Fix Version/s: 1.2.0
> Unify the sort column and sort scope in create table command
> ------------------------------------------------------------
>
> Key: CARBONDATA-1438
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1438
> Project: CarbonData
> Issue Type: Improvement
> Reporter: chenerlu
> Fix For: 1.2.0
>
> Time Spent: 14h 40m
> Remaining Estimate: 0h
>
> 1 Requirement
> Currently, Users can specify sort column in table properties when create
> table. And when load data, users can also specify sort scope in load options.
> In order to improve the ease of use for users, it will be better to specify
> the sort related parameters all in create table command.
> Once sort scope is specified in create table command, it will be used in load
> data even users have specified in load options.
> 2 Detailed design
> 2.1 Task-01
> Requirement: Create table can support specify sort scope
> Implement: Take use of table properties (Map<String, String>), will specify
> sort scope in table properties by key/value pair, then existing interface
> will be called to write this key/value pair into metastore.
> Will support Global Sort,Local Sort and No Sort,it can be specified in sql
> command:
> CREATE TABLE tableWithGlobalSort (
> shortField SHORT,
> intField INT,
> bigintField LONG,
> doubleField DOUBLE,
> stringField STRING,
> timestampField TIMESTAMP,
> decimalField DECIMAL(18,2),
> dateField DATE,
> charField CHAR(5)
> )
> STORED BY 'carbondata'
> TBLPROPERTIES('SORT_COLUMNS'='stringField', 'SORT_SCOPE'='GLOBAL_SORT')
>
> Tips:If the sort scope is global Sort, users should specify
> GLOBAL_SORT_PARTITIONS. If users do not specify it, it will use the number of
> map task. GLOBAL_SORT_PARTITIONS should be Integer type, the range is
> [1,Integer.MaxValue],it is only used when the sort scope is global sort.
> Global Sort Use orderby operator in spark, data is ordered in segment level.
> Local Sort Node ordered, carbondata file is ordered if it is written by
> one task.
> No Sort No sort
> Tips:key and value is case-insensitive.
> 2.2 Task-02
> Requirement:
> Load data in will support local sort, no sort, global sort
> Ignore the sort scope specified in load data and use the parameter which
> specified in create table.
> Currently, user can specify the sort scope and global sort partitions in load
> options, After modification, it will ignore the sort scope which specified in
> load options and will get sort scope from table properties.
> Current logic: sort scope is from load options
> Number Prerequisite Sort scope
> 1 isSortTable is true && Sort Scope is Global Sort Global
> Sort(first check)
> 2 isSortTable is false No Sort
> 3 isSortTable is true Local Sort
> Tips: isSortTable is true means this table contains sort column or it
> contains dimensions (except complex type), like string type.
> For example:
> Create table xxx1 (col1 string col2 int) stored by ‘carbondata’ --- sort table
> Create table xx1 (col1 int, col2 int) stored by ‘carbondata’ --- not sort
> table
> Create table xx (col1 int, col2 string) stored by ‘carbondata’ tblproperties
> (‘sort_column’=’col1’) –- sort table
> New logic:sort scope is from create table
> Number Prerequisite Code branch
> 1 isSortTable = true && Sort Scope is Global Sort Global Sort(first check)
> 2 isSortTable= false || Sort Scope is No Sort No Sort
> 3 isSortTable is true && Sort Scope is Local Sort Local Sort
> 4 isSortTable is true,without specify Sort Scope Local Sort, (Keep
> current logic)
> 3 Acceptance standard
> Number Acceptance standard
> 1 Use can specify sort scope(global, local, no sort) when create carbon
> table in sql type
> 2 Load data will ignore the sort scope specified in load options and will
> use the parameter which specify in create table command. If user still
> specify the sort scope in load options, will give warning and inform user
> that he will use the sort scope which specified in create table.
> 4 Feature restrictions
> NA
> 5 Dependencies
> NA
> 6 Technical risk
> NA
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)