carbondata git commit: [CARBONDATA-3007][Doc] Fix error in documents

jackylk Thu, 18 Oct 2018 02:07:15 -0700

Repository: carbondata
Updated Branches:
  refs/heads/master 7dea46168 -> 4a090ce27



[CARBONDATA-3007][Doc] Fix error in documents

This closes #2813


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/4a090ce2
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/4a090ce2
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/4a090ce2

Branch: refs/heads/master
Commit: 4a090ce27ee7d42bc76de812f300d3d72976eb18
Parents: 7dea461
Author: xuchuanyin <[email protected]>
Authored: Mon Oct 15 20:23:22 2018 +0800
Committer: Jacky Li <[email protected]>
Committed: Thu Oct 18 17:05:56 2018 +0800

----------------------------------------------------------------------
 docs/performance-tuning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/4a090ce2/docs/performance-tuning.md
----------------------------------------------------------------------
diff --git a/docs/performance-tuning.md b/docs/performance-tuning.md
index 6c87ce9..f43385a 100644
--- a/docs/performance-tuning.md
+++ b/docs/performance-tuning.md
@@ -168,7 +168,7 @@
 | carbon.compaction.level.threshold | spark/carbonlib/carbon.properties | Data 
loading and Querying | For minor compaction, specifies the number of segments 
to be merged in stage 1 and number of compacted segments to be merged in stage 
2. | Each CarbonData load will create one segment, if every load is small in 
size it will generate many small files over a period of time impacting the 
query performance. Configuring this parameter will merge the small segment to 
one big segment which will sort the data and improve the performance. For 
Example in one telecommunication scenario, the performance improves about 2 
times after minor compaction. |
 | spark.sql.shuffle.partitions | spark/conf/spark-defaults.conf | Querying | 
The number of task started when spark shuffle. | The value can be 1 to 2 times 
as much as the executor cores. In an aggregation scenario, reducing the number 
from 200 to 32 reduced the query time from 17 to 9 seconds. |
 | spark.executor.instances/spark.executor.cores/spark.executor.memory | 
spark/conf/spark-defaults.conf | Querying | The number of executors, CPU cores, 
and memory used for CarbonData query. | In the bank scenario, we provide the 4 
CPUs cores and 15 GB for each executor which can get good performance. This 2 
value does not mean more the better. It needs to be configured properly in case 
of limited resources. For example, In the bank scenario, it has enough CPU 32 
cores each node but less memory 64 GB each node. So we cannot give more CPU but 
less memory. For example, when 4 cores and 12GB for each executor. It sometimes 
happens GC during the query which impact the query performance very much from 
the 3 second to more than 15 seconds. In this scenario need to increase the 
memory or decrease the CPU cores. |
-| carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data loading 
| The buffer size to store records, returned from the block scan. | In limit 
scenario this parameter is very important. For example your query limit is 
1000. But if we set this value to 3000 that means we get 3000 records from scan 
but spark will only take 1000 rows. So the 2000 remaining are useless. In one 
Finance test case after we set it to 100, in the limit 1000 scenario the 
performance increase about 2 times in comparison to if we set this value to 
12000. |
+| carbon.detail.batch.size | spark/carbonlib/carbon.properties | Querying | 
The buffer size to store records, returned from the block scan. | In limit 
scenario this parameter is very important. For example your query limit is 
1000. But if we set this value to 3000 that means we get 3000 records from scan 
but spark will only take 1000 rows. So the 2000 remaining are useless. In one 
Finance test case after we set it to 100, in the limit 1000 scenario the 
performance increase about 2 times in comparison to if we set this value to 
12000. |
 | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | 
Whether use YARN local directories for multi-table load disk load balance | If 
this is set it to true CarbonData will use YARN local directories for 
multi-table load disk load balance, that will improve the data load 
performance. |
 | carbon.use.multiple.temp.dir | spark/carbonlib/carbon.properties | Data 
loading | Whether to use multiple YARN local directories during table data 
loading for disk load balance | After enabling 'carbon.use.local.dir', if this 
is set to true, CarbonData will use all YARN local directories during data load 
for disk load balance, that will improve the data load performance. Please 
enable this property when you encounter disk hotspot problem during data 
loading. |
 | carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data 
loading | Specify the name of compressor to compress the intermediate sort 
temporary files during sort procedure in data loading. | The optional values 
are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD', and empty. By default, empty means 
that Carbondata will not compress the sort temp files. This parameter will be 
useful if you encounter disk bottleneck. |

carbondata git commit: [CARBONDATA-3007][Doc] Fix error in documents

Reply via email to