Manish Gupta created CARBONDATA-2381:
----------------------------------------

             Summary: Improve compaction performance by filling batch result in 
columnar format and performing IO at blocklet level
                 Key: CARBONDATA-2381
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2381
             Project: CarbonData
          Issue Type: Improvement
    Affects Versions: 1.3.1
            Reporter: Manish Gupta
            Assignee: Manish Gupta


Problem: Compaction performance is slow as compared to data load. If compaction 
threshold is set to 6,6 then on minor compaction after 6 loads compaction 
performance is almost 6-7 times of the total load performance for 6 loads.

Analysis:
 # During compaction result filling is done in row format. Due to this as the 
number of columns increases the dimension and measure data filling time 
increases. This happens because in row filling we are not able to take 
advantage of OS cacheable buffers as we continuously read data for next column.
 # As compaction uses a page level reader flow wherein both IO and 
uncompression is done at page level, the IO and uncompression time increases in 
this model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to