[
https://issues.apache.org/jira/browse/CARBONDATA-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jacky Li resolved CARBONDATA-742.
---------------------------------
Resolution: Fixed
Fix Version/s: 1.1.0-incubating
> Add batch sort to improve the loading performance
> -------------------------------------------------
>
> Key: CARBONDATA-742
> URL: https://issues.apache.org/jira/browse/CARBONDATA-742
> Project: CarbonData
> Issue Type: Improvement
> Reporter: Ravindra Pesala
> Assignee: Ravindra Pesala
> Fix For: 1.1.0-incubating
>
> Time Spent: 8h 20m
> Remaining Estimate: 0h
>
> Current Problem:
> Sort step is major issue as it is blocking step. It needs to receive all data
> and write down the sort temp files to disk, after that only data writer step
> can start.
> Solution:
> Make sort step as non blocking step so it avoids waiting of Data writer step.
> Process the data in sort step in batches with size of in-memory capability of
> the machine. For suppose if machine can allocate 4 GB to process data
> in-memory, then Sort step can sorts the data with batch size of 2GB and gives
> it to the data writer step. By the time data writer step consumes the data,
> sort step receives and sorts the data. So here all steps are continuously
> working and absolutely there is no disk IO in sort step.
> So there would not be any waiting of data writer step for sort step, As and
> when sort step sorts the data in memory data writer can start writing it.
> It can significantly improves the performance.
> Advantages:
> Increases the loading performance as there is no intermediate IO and no
> blocking of Sort step.
> There is no extra effort for compaction, the current flow can handle it.
> Disadvantages:
> Number of driver side btrees will increase. So the memory might increase but
> it could be controlled by current LRU cache implementation.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)