Manish Gupta created CARBONDATA-2092:
----------------------------------------

             Summary: Fix compaction bug to prevent the compaction flow from 
going through the restructure compaction flow
                 Key: CARBONDATA-2092
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2092
             Project: CarbonData
          Issue Type: Bug
            Reporter: Manish Gupta
            Assignee: Manish Gupta


Problem and analysis:

----------------------------------------

During data load current schema timestamp is written to the carbondata 
fileHeader. This is used during compaction to decide whether the block is a 
restructured block or the block is according to the latest schema.

As the blocklet information is now stored in the index file, while laoding it 
in memory the carbondata file header is not read and due to this the schema 
timestamp is not getting set to the blocklet information. Due to this during 
compaction flow there is a mismatch on comparing the current schema time stamp 
with the timestamp stored in the block and the flow goes through the 
restructure compaction flow instead of normal compaction flow.

Impact:

-------------

Compaction performance degradation as restructure compaction flow involves 
sorting of data again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to