carbondata git commit: [CARBONDATA-1325] Add partition guidance doc

chenliang613 Sat, 26 Aug 2017 08:35:32 -0700

Repository: carbondata
Updated Branches:
  refs/heads/master 4bd11072b -> 2c88bffdc



[CARBONDATA-1325] Add partition guidance doc

This closes #1258


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/2c88bffd
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/2c88bffd
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/2c88bffd

Branch: refs/heads/master
Commit: 2c88bffdc816e3e45e02dd1c655348ab96dfe1ac
Parents: 4bd1107
Author: lionelcao <[email protected]>
Authored: Tue Aug 15 18:26:42 2017 +0800
Committer: chenliang613 <[email protected]>
Committed: Sat Aug 26 23:34:48 2017 +0800

----------------------------------------------------------------------
 README.md               |   1 +
 docs/partition-guide.md | 161 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 162 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/2c88bffd/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index 3226853..7033238 100644
--- a/README.md
+++ b/README.md
@@ -48,6 +48,7 @@ CarbonData is built using Apache Maven, to [build 
CarbonData](https://github.com
 * [Data 
Management](https://github.com/apache/carbondata/blob/master/docs/data-management.md)
 * [DDL Operations on 
CarbonData](https://github.com/apache/carbondata/blob/master/docs/ddl-operation-on-carbondata.md)
 
 * [DML Operations on 
CarbonData](https://github.com/apache/carbondata/blob/master/docs/dml-operation-on-carbondata.md)
  
+* [Partition 
Table](https://https://github.com/apache/carbondata/blob/master/docs/partition-guide.md)
 * [Cluster Installation and 
Deployment](https://github.com/apache/carbondata/blob/master/docs/installation-guide.md)
 * [Configuring 
Carbondata](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md)
 * [FAQ](https://github.com/apache/carbondata/blob/master/docs/faq.md)

http://git-wip-us.apache.org/repos/asf/carbondata/blob/2c88bffd/docs/partition-guide.md
----------------------------------------------------------------------
diff --git a/docs/partition-guide.md b/docs/partition-guide.md
new file mode 100644
index 0000000..2a0df76
--- /dev/null
+++ b/docs/partition-guide.md
@@ -0,0 +1,161 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+-->
+
+### CarbonData Partition Table Guidance
+This guidance illustrates how to create & use partition table in CarbonData.
+
+* [Create Partition Table](#create-partition-table)
+  - [Create Hash Partition Table](#create-hash-partition-table)
+  - [Create Range Partition Table](#create-range-partition-table)
+  - [Create List Partition Table](#create-list-partition-table)
+* [Show Partitions](#show-partitions)
+* [Maintain the Partitions](#maintain-the-partitions)
+* [Partition Id](#partition-id)
+* [Tips](#tips)
+
+### Create Partition Table
+
+##### Create Hash Partition Table
+```
+   CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
+                    [(col_name data_type , ...)]
+   PARTITIONED BY (partition_col_name data_type)
+   STORED BY 'carbondata'
+   [TBLPROPERTIES ('PARTITION_TYPE'='HASH', 
+                   'PARTITION_NUM'='N' ...)]  
+   //N is the number of hash partitions
+```
+
+Example:
+```
+   create table if not exists hash_partition_table(
+      col_A String,
+      col_B Int,
+      col_C Long,
+      col_D Decimal(10,2),
+      col_F Timestamp
+   ) partitioned by (col_E Long)
+   stored by 'carbondata'
+   tblproperties('partition_type'='Hash','partition_num'='9')
+```
+
+##### Create Range Partition Table
+```
+   CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
+                    [(col_name data_type , ...)]
+   PARTITIONED BY (partition_col_name data_type)
+   STORED BY 'carbondata'
+   [TBLPROPERTIES ('PARTITION_TYPE'='RANGE', 
+                   'RANGE_INFO'='2014-01-01, 2015-01-01, 2016-01-01' ...)]
+```
+Notes: 
+1. The 'RANGE_INFO' defined in table properties must be in ascending order.
+2. If the partition column is Date/Timestamp type, the format could be defined 
in CarbonProperties. By default it's yyyy-MM-dd.
+
+Example:
+```
+   create table if not exists hash_partition_table(
+      col_A String,
+      col_B Int,
+      col_C Long,
+      col_D Decimal(10,2),
+      col_E Long
+   ) partitioned by (col_F Timestamp)
+   stored by 'carbondata'
+   tblproperties('partition_type'='Range',
+   'range_info'='2015-01-01, 2016-01-01, 2017-01-01, 2017-02-01')
+```
+
+##### Create List Partition Table
+```
+   CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
+                    [(col_name data_type , ...)]
+   PARTITIONED BY (partition_col_name data_type)
+   STORED BY 'carbondata'
+   [TBLPROPERTIES ('PARTITION_TYPE'='LIST', 
+                   'LIST_INFO'='A, B, C' ...)]
+```
+Notes:
+1. List partition support list info in one level group. 
+
+Exampleï¼
+```
+   create table if not exists hash_partition_table(
+      col_B Int,
+      col_C Long,
+      col_D Decimal(10,2),
+      col_E Long,
+      col_F Timestamp
+   ) partitioned by (col_A String)
+   stored by 'carbondata'
+   tblproperties('partition_type'='List',
+   'list_info'='aaaa, bbbb, (cccc, dddd), eeee')
+```
+
+
+### Show Partitions
+Execute following command to get the partition information
+```
+   SHOW PARTITIONS [db_name.]table_name
+
+```
+
+### Maintain the Partitions
+##### Add a new partition
+```
+   ALTER TABLE [db_name].table_name ADD PARTITION('new_partition')
+```
+##### Split a partition
+```
+   ALTER TABLE [db_name].table_name SPLIT PARTITION(partition_id) 
INTO('new_partition1', 'new_partition2'...)
+```
+##### Drop a partition
+```
+   //Drop partition definition only and keep data
+   ALTER TABLE [db_name].table_name DROP PARTITION(partition_id)
+   
+   //Drop both partition definition and data
+   ALTER TABLE [db_name].table_name DROP PARTITION(partition_id) WITH DATA
+```
+Notes:
+1. For the 1st case(keep data), 
+   * if the table is a range partition table, data will be merged into the 
next partition, and if the dropped partition is the last one, then data will be 
merged into default partition.
+   * if the table is a list partition table, data will be merged into default 
partition.
+2. Drop default partition is not allowed, but you can use DELETE statement to 
delete data in default partition.
+3. partition_id could be got from SHOW PARTITIONS command.
+4. Hash partition table is not supported for the ADD, SPLIT, DROP command.
+
+### Partition Id
+In Carbondata, we don't use folders to divide partitions(just like hive did), 
instead we use partition id to replace the task id. 
+It could make use of the characteristic and meanwhile reduce some metadata. 
+```
+SegmentDir/0_batchno0-0-1502703086921.carbonindex
+           ^
+SegmentDir/part-0-0_batchno0-0-1502703086921.carbondata
+                  ^
+```
+
+### Tips
+Here are some tips to improve query performance of carbon partition table:
+##### 1. Do some analysis before choose the proper partition column
+The distribution of data on some column could be very skew, building a skewed 
partition table is meaningless, so do some basic statistic analysis to avoid 
creating partition table on an extremely skewed column.
+##### 2. Exclude partition column from sort columns
+If you have many dimensions need to be sorted, then exclude partition column 
from sort columns, that will put other dimensions in a better position of 
sorting.
+##### 3. Remember to add filter on partition column when writing SQLs
+When writing SQLs on partition table, try to use filters on partition column.

carbondata git commit: [CARBONDATA-1325] Add partition guidance doc

Reply via email to