GitHub user xuchuanyin reopened a pull request:

    https://github.com/apache/carbondata/pull/2628

    [CARBONDATA-2851][CARBONDATA-2852] Support zstd as column compressor in 
final store

    1. add zstd compressor for compressing column data
    2. add zstd support in thrift
    3. since zstd does not support zero-copy while compressing, offheap will
    not take effect for zstd
    4. Column compressor is configured through system property and can be 
changed in each load. During querying, carbondata will get the compressor 
information from metadata in the file data.
    5. This PR also considered and verified on the legacy store and compaction
    
    A simple test with 1.2GB raw CSV data shows that the size (in MB) of final 
store with different compressor: 
    
    | local dictionary | snappy | zstd | Size Reduced |
    | --- | --- | --- | -- |
    | local dict enabled | 335 | 207 | 38.2% |
    | local dict disabled | 375 | 225 | 40% |
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [x] Any interfaces changed?
     `Yes, only internal used interfaces are changed`
     - [x] Any backward compatibility impacted?
     `Yes, backward compatibility is handled`
     - [x] Document update required?
    `Yes`
     - [x] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests 
are required?
    `Added tests`
            - How it is tested? Please attach test report.
    `Tested in local machine`
            - Is it a performance related change? Please attach the performance 
test report.
    `The size of final store has been decreased by 40% compared with default 
snappy`
            - Any additional information to help reviewers in testing this 
change.
    `NA`
           
     - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
    `NA`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata 
0810_support_zstd_compressor_final_store

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2628.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2628
    
----
commit e840c5bcfe7c27a5d2eb6459d7e391dac0a2091f
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-08-10T14:02:57Z

    Support zstd as column compressor in final store
    
    1. add zstd compressor for compressing column data
    2. add zstd support in thrift
    3. legacy store is not considered in this commit
    4. since zstd does not support zero-copy while compressing, offheap will
    not take effect for zstd
    5. support lazy load for compressor

commit 926d64a245e76448d71d0557941d7e29559571c7
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-08-13T13:45:42Z

    Support new compressor on legacy store
    
    In query procedure, we need to decompress the column page. Previously we
    get the compressor from system property. Now since we support new
    compressors, we should read the compressor information from the metadata
    in datafiles.
    This PR also solve the compatibility related problems on V1/V2 store where 
we
    only support snappy.

commit df0ca034b74cde02ec3568b8b5b94930a44c5763
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-08-14T08:38:00Z

    fix comments

----


---

Reply via email to