[
https://issues.apache.org/jira/browse/CARBONDATA-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jacky Li resolved CARBONDATA-470.
---------------------------------
Resolution: Fixed
Assignee: Ravindra Pesala
Fix Version/s: 1.0.0-incubating
> Add unsafe offheap and on-heap sort in carbodata loading
> --------------------------------------------------------
>
> Key: CARBONDATA-470
> URL: https://issues.apache.org/jira/browse/CARBONDATA-470
> Project: CarbonData
> Issue Type: Improvement
> Reporter: Ravindra Pesala
> Assignee: Ravindra Pesala
> Fix For: 1.0.0-incubating
>
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> In the current carbondata system loading performance is not so encouraging
> since we need to sort the data at executor level for data loading. Carbondata
> collects batch of data and sorts before dumping to the temporary files and
> finally it does merge sort from those temporary files to finish sorting. Here
> we face two major issues , one is disk IO and second is GC issue. Even though
> we dump to the file still carbondata face lot of GC issue since we sort batch
> data in-memory before dumping to the temporary files.
> To solve the above problems we can introduce Unsafe Storage and Unsafe sort.
> Unsafe Storage : User can configure the memory limit to keep the amount of
> data to in-memory. Here we can keep all the data in continuous memory
> location either on off-heap or on-heap using Unsafe. Once configure limit
> exceeds remaining data will be spilled to disk.
> Unsafe Sort : The data which is store in-memory using Unsafe can be sorted
> using Unsafe sort.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)