[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

sraghunandan Mon, 24 Jul 2017 23:07:39 -0700

Github user sraghunandan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1177#discussion_r129197372
  
    --- Diff: docs/useful-tips-on-carbondata.md ---
    @@ -231,5 +231,6 @@ scenarios. After the completion of POC, some of the 
configurations impacting the
     | spark.executor.instances/spark.executor.cores/spark.executor.memory | 
spark/conf/spark-defaults.conf | Querying | The number of executors, CPU cores, 
and memory used for CarbonData query. | In the bank scenario, we provide the 4 
CPUs cores and 15 GB for each executor which can get good performance. This 2 
value does not mean more the better. It needs to be configured properly in case 
of limited resources. For example, In the bank scenario, it has enough CPU 32 
cores each node but less memory 64 GB each node. So we cannot give more CPU but 
less memory. For example, when 4 cores and 12GB for each executor. It sometimes 
happens GC during the query which impact the query performance very much from 
the 3 second to more than 15 seconds. In this scenario need to increase the 
memory or decrease the CPU cores. |
     | carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data 
loading | The buffer size to store records, returned from the block scan. | In 
limit scenario this parameter is very important. For example your query limit 
is 1000. But if we set this value to 3000 that means we get 3000 records from 
scan but spark will only take 1000 rows. So the 2000 remaining are useless. In 
one Finance test case after we set it to 100, in the limit 1000 scenario the 
performance increase about 2 times in comparison to if we set this value to 
12000. |
     | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading 
| Whether use YARN local directories for multi-table load disk load balance | 
If this is set it to true CarbonData will use YARN local directories for 
multi-table load disk load balance, that will improve the data load 
performance. |
    +| carbon.use.multiple.temp.dir | spark/carbonlib/carbon.properties | Data 
loading | Whether to use multiple YARN local directories during table data 
loading for disk load balance | After enabling 'carbon.use.local.dir', if this 
is set to true, CarbonData will use YARN local directories during data load for 
disk load balance, that will improve the data load performance. Please enable 
this property especially when you encounter disk hotspot problem during data 
loading. |
    --- End diff --
    
    will use all yarn local directories



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1177: [CARBONDATA-1281] Support multiple temp dirs ...

Reply via email to