carbondata git commit: [CARBONDATA-2414][Doc] Optimize documents for sort column bounds

chenliang613 Sat, 28 Apr 2018 23:53:09 -0700

Repository: carbondata
Updated Branches:
  refs/heads/master cf8fa9540 -> 3ec7b3ffa



[CARBONDATA-2414][Doc] Optimize documents for sort column bounds

Optimize documents for sort column bounds

This closes #2247


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/3ec7b3ff
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/3ec7b3ff
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/3ec7b3ff

Branch: refs/heads/master
Commit: 3ec7b3ffa01016feee04d5b63a97b4f86ebbb85c
Parents: cf8fa95
Author: xuchuanyin <[email protected]>
Authored: Sat Apr 28 14:03:22 2018 +0800
Committer: chenliang613 <[email protected]>
Committed: Sun Apr 29 14:52:40 2018 +0800

----------------------------------------------------------------------
 docs/data-management-on-carbondata.md | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/3ec7b3ff/docs/data-management-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/data-management-on-carbondata.md 
b/docs/data-management-on-carbondata.md
index 8999f32..a92ec4f 100644
--- a/docs/data-management-on-carbondata.md
+++ b/docs/data-management-on-carbondata.md
@@ -488,14 +488,18 @@ This tutorial is going to introduce all commands and data 
operations on CarbonDa
 
   - **SORT COLUMN BOUNDS:** Range bounds for sort columns.
 
+    Suppose the table is created with 'SORT_COLUMNS'='name,id' and the range 
for name is aaa~zzz, the value range for id is 0~1000. Then during data 
loading, we can specify the following option to enhance data loading 
performance.
     ```
-    OPTIONS('SORT_COLUMN_BOUNDS'='v11,v21,v31;v12,v22,v32;v13,v23,v33')
+    OPTIONS('SORT_COLUMN_BOUNDS'='f,250;l,500;r,750')
     ```
+    Each bound is separated by ';' and each field value in bound is separated 
by ','. In the example above, we provide 3 bounds to distribute records to 4 
partitions. The values 'f','l','r' can evenly distribute the records. Inside 
carbondata, for a record we compare the value of sort columns with that of the 
bounds and decide which partition the record will be forwarded to.
+
     **NOTE:**
     * SORT_COLUMN_BOUNDS will be used only when the SORT_SCOPE is 'local_sort'.
-    * Each bound is separated by ';' and each field value in bound is 
separated by ','.
-    * Carbondata will use these bounds as ranges to process data concurrently.
+    * Carbondata will use these bounds as ranges to process data concurrently 
during the final sort percedure. The records will be sorted and written out 
inside each partition. Since the partition is sorted, all records will be 
sorted.
     * Since the actual order and literal order of the dictionary column are 
not necessarily the same, we do not recommend you to use this feature if the 
first sort column is 'dictionary_include'.
+    * The option works better if your CPU usage during loading is low. If your 
system is already CPU tense, better not to use this option. Besides, it depends 
on the user to specify the bounds. If user does not know the exactly bounds to 
make the data distributed evenly among the bounds, loading performance will 
still be better than before or at least the same as before.
+    * Users can find more information about this option in the description of 
PR1953.
 
   - **SINGLE_PASS:** Single Pass Loading enables single job to finish data 
loading with dictionary generation on the fly. It enhances performance in the 
scenarios where the subsequent data loading after initial load involves fewer 
incremental updates on the dictionary.

carbondata git commit: [CARBONDATA-2414][Doc] Optimize documents for sort column bounds

Reply via email to