GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/1875

    [CARBONDATA-2092] Fix compaction bug to prevent the compaction flow from 
going through the restructure compaction flow

    **Problem and analysis:**
    During data load current schema timestamp is written to the carbondata 
fileHeader. This is used during compaction to decide whether the block is a 
restructured block or the block is according to the latest schema.
    As the blocklet information is now stored in the index file, while laoding 
it in memory the carbondata file header is not read and due to this the schema 
timestamp is not getting set to the blocklet information. Due to this during 
compaction flow there is a mismatch on comparing the current schema time stamp 
with the timestamp stored in the block and the flow goes through the 
restructure compaction flow instead of normal compaction flow.
    
    **Impact:**
    Compaction performance degradation as restructure compaction flow involves 
sorting of data again.
    
    **Solution:**
    Modified code to fix compaction bug to prevent the compaction flow from 
going through the restructure compaction flow until and unless and restructure 
add or drop column operation has not been performed
    
     - [ ] Any interfaces changed?
    No 
     - [ ] Any backward compatibility impacted?
     No
     - [ ] Document update required?
    No
     - [ ] Testing done
    Manual testing       
     - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata compaction_bug_fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1875.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1875
    
----
commit 18064e5b17649376169211b151375801c4dfca34
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-01-23T15:42:39Z

    Modified code to fix compaction bug to prevent the compaction flow from 
going through the restructure compaction flow until and unless and
    restructure add or drop column operation has not been performed

----


---

Reply via email to