Ravindra Pesala created CARBONDATA-1224:
-------------------------------------------
Summary: Going out of memory if more segments are compacted at
once in V3 format
Key: CARBONDATA-1224
URL: https://issues.apache.org/jira/browse/CARBONDATA-1224
Project: CarbonData
Issue Type: Bug
Reporter: Ravindra Pesala
In V3 format we read the whole blocklet at once to memory in order save IO
time. But it turns out to be costlier in case of parallel reading of more
carbondata files.
For example if we need to compact 50 segments then compactor need to open the
readers on all the 50 segments to do merge sort. But the memory consumption is
too high if each reader reads whole blocklet to the memory and there is high
chances of going out of memory.
Solution:
In this type of scenarios we can introduce new readers for V3 to read the data
page by page instead of reading whole blocklet at once to reduce the memory
footprint.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)