Ravindra Pesala created CARBONDATA-470:
------------------------------------------

             Summary: Add unsafe offheap and on-heap sort in carbodata loading
                 Key: CARBONDATA-470
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-470
             Project: CarbonData
          Issue Type: Improvement
            Reporter: Ravindra Pesala


In the current carbondata system loading performance is not so encouraging 
since we need to sort the data at executor level for data loading. Carbondata 
collects batch of data and sorts before dumping to the temporary files and 
finally it does merge sort from those temporary files to finish sorting. Here 
we face two major issues , one is disk IO and second is GC issue. Even though 
we dump to the file still carbondata face lot of GC issue since we sort batch 
data in-memory before dumping to the temporary files.

To solve the above problems we can introduce Unsafe Storage and Unsafe sort.
Unsafe Storage : User can configure the memory limit to keep the amount of data 
to in-memory. Here we can keep all the data in continuous memory location 
either on off-heap or on-heap using Unsafe. Once configure limit exceeds 
remaining data will be spilled to disk.
Unsafe Sort : The data which is store in-memory using Unsafe can be sorted 
using Unsafe sort. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to