[
https://issues.apache.org/jira/browse/CARBONDATA-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
xuchuanyin resolved CARBONDATA-2238.
------------------------------------
Resolution: Fixed
> Optimization in unsafe sort during data loading
> -----------------------------------------------
>
> Key: CARBONDATA-2238
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2238
> Project: CarbonData
> Issue Type: Improvement
> Components: data-load
> Reporter: xuchuanyin
> Assignee: xuchuanyin
> Priority: Major
> Time Spent: 6h 10m
> Remaining Estimate: 0h
>
> Inspired by batch_sort, if we have enough memory, in local_sort with unsafe
> property, we can hold all the row pages in memory if possible and only spill
> the pages to disk as sort temp file if the memory is unavailable.
> Before spilling the pages, we can do in-memory merge sort of the pages.
> Each time we request an unsafe row page, if the memory is unavailable, we can
> trigger a merge sort for the in-memory pages and spill the result to disk as
> a sort temp file. So the incoming pages will be held into the memory instead
> of spilling to disk directly.
> After this implementation, the data size during each spilling will be bigger
> than that of before and will benefit the disk IO.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)