[
https://issues.apache.org/jira/browse/CARBONDATA-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089614#comment-16089614
]
xuchuanyin commented on CARBONDATA-1281:
----------------------------------------
[~Bjangir] I've checked the code, and cannot find the property
`carbon.tempstore.locations`.
Do you mean the property `carbon.tempstore.location` in carbondata source code?
This property does not resolve the hotspot problem while do loading.
> Disk hotspot found during data loading
> --------------------------------------
>
> Key: CARBONDATA-1281
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1281
> Project: CarbonData
> Issue Type: Improvement
> Components: core, data-load
> Affects Versions: 1.1.0
> Reporter: xuchuanyin
> Time Spent: 1h
> Remaining Estimate: 0h
>
> # Scenario
> Currently we have done a massive data loading. The input data is about 71GB
> in CSV format,and have about 88million records. When using carbondata, we do
> not use any dictionary encoding. Our testing environment has three nodes and
> each of them have 11 disks as yarn executor directory. We submit the loading
> command through JDBCServer.The JDBCServer instance have three executors in
> total, one on each node respectively. The loading takes about 10minutes
> (+-3min vary from each time).
> We have observed the nmon information during the loading and find:
> 1. lots of CPU waits in the first half of loading;
> 2. only one single disk has many writes and almost reaches its bottleneck
> (Avg. 80M/s, Max. 150M/s on SAS Disk)
> 3. the other disks are quite idel
> # Analyze
> When do data loading, carbondata read and sort data locally(default scope)
> and write the temp files to local disk. In my case, there is only one
> executor in one node, so carbondata write all the temp file to one
> disk(container directory or yarn local directory), thus resulting into single
> disk hotspot.
> # Modification
> We should support multiple directory for writing temp files to avoid disk
> hotspot.
> Ps: I have improved this in my environment and the result is pretty
> optimistic: the loading takes about 6minutes (10 minutes before improving).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)