[ 
https://issues.apache.org/jira/browse/CARBONDATA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-318:
--------------------------------
    Description: 
External Sorter should sort in memory until it reach configured size, then 
spill to disk. It should provide following interface:
1. insertRow/insertRowBatch: will take an Iterator as input and insert rows 
from the iterator into the sorter. sorter will decide when to spill to disk 
based on the total inserted size. (JDK does not provide API for object size, 
need another JIRA issue to improve on this)
2. getIterator: will return an iterator that iterates on sorted rows, the 
sorted row could come from memory or files

External Sorter depends on FileWriterFactory to get a FileWriter to spill data 
into files. FileWriterFactory should be provided by configuration. Multiple 
implementations are possible, like writing into one folder or multiple folders

  was:
External Sorter should sort in memory until it reach configured size, then 
spill to disk. It should provide following interface:
1. insertRow/insertRowBatch: will take an Iterator as input and insert rows 
from the iterator into the sorter
2. getIterator: will return an iterator that iterates on sorted rows, the 
sorted row could come from memory or files

External Sorter depends on FileWriterFactory to get a FileWriter to spill data 
into files. FileWriterFactory should be provided by configuration. Multiple 
implementations are possible, like writing into one folder or multiple folders


> Implement an ExternalSorter that makes maximum usage of memory while sorting
> ----------------------------------------------------------------------------
>
>                 Key: CARBONDATA-318
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-318
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: Jacky Li
>             Fix For: 0.2.0-incubating
>
>
> External Sorter should sort in memory until it reach configured size, then 
> spill to disk. It should provide following interface:
> 1. insertRow/insertRowBatch: will take an Iterator as input and insert rows 
> from the iterator into the sorter. sorter will decide when to spill to disk 
> based on the total inserted size. (JDK does not provide API for object size, 
> need another JIRA issue to improve on this)
> 2. getIterator: will return an iterator that iterates on sorted rows, the 
> sorted row could come from memory or files
> External Sorter depends on FileWriterFactory to get a FileWriter to spill 
> data into files. FileWriterFactory should be provided by configuration. 
> Multiple implementations are possible, like writing into one folder or 
> multiple folders



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to