Manish Gupta created CARBONDATA-2381:
----------------------------------------
Summary: Improve compaction performance by filling batch result in
columnar format and performing IO at blocklet level
Key: CARBONDATA-2381
URL: https://issues.apache.org/jira/browse/CARBONDATA-2381
Project: CarbonData
Issue Type: Improvement
Affects Versions: 1.3.1
Reporter: Manish Gupta
Assignee: Manish Gupta
Problem: Compaction performance is slow as compared to data load. If compaction
threshold is set to 6,6 then on minor compaction after 6 loads compaction
performance is almost 6-7 times of the total load performance for 6 loads.
Analysis:
# During compaction result filling is done in row format. Due to this as the
number of columns increases the dimension and measure data filling time
increases. This happens because in row filling we are not able to take
advantage of OS cacheable buffers as we continuously read data for next column.
# As compaction uses a page level reader flow wherein both IO and
uncompression is done at page level, the IO and uncompression time increases in
this model.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)