xuchuanyin created CARBONDATA-2238:
--------------------------------------

             Summary: Optimization in unsafe sort during data loading
                 Key: CARBONDATA-2238
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2238
             Project: CarbonData
          Issue Type: Improvement
          Components: data-load
            Reporter: xuchuanyin
            Assignee: xuchuanyin


Inspired by batch_sort, if we have enough memory, in local_sort with unsafe 
property, we can hold all the row pages in memory if possible and only spill 
the pages to disk as sort temp file if the memory is unavailable.

Before spilling the pages, we can do in-memory merge sort of the pages.

Each time we request an unsafe row page, if the memory is unavailable, we can 
trigger a merge sort for the in-memory pages and spill the result to disk as a 
sort temp file. So the incoming pages will be held into the memory instead of 
spilling to disk directly.

After this implementation, the data size during each spilling will be bigger 
than that of before and will benefit the disk IO.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to