GitHub user shardul-cr7 opened a pull request:
https://github.com/apache/carbondata/pull/2847
[WIP]Support Gzip as column compressor
Gzip compressed file size is less than that of snappy but takes more time.
Data generated by tpch-dbgen(lineitem)
**Load Performance Comparisons (Compression)**
*Test Case 1*
*File Size 3.9G*
*Records ~30M*
| Codec Used | Load Time | File Size After Load |
| ------ | ------ | ------ |
| Snappy | 156s | 101M
| Zstd| 153s | 2.2M
| Gzip| 163s | 12.1M
*Test Case 2*
*File Size 7.8G*
*Records ~60M*
| Codec Used | Load Time | File Size After Load |
| ------ | ------ | ------ |
| Snappy | 336s | 203.6M
| Zstd| 352s | 4.3M
| Gzip| 354s | 12.1M
**Query Performance (Decompression)**
*Test Case 1*
| Codec Used | Full Scan Time
| ------ | ------
| Snappy | 16.108s
| Zstd| 14.595s
| Gzip| 14.313s
*Test Case 2*
| Codec Used | Full Scan Time
| ------ | ------
| Snappy | 23.559s
| Zstd| 23.913s
| Gzip| 26.741s
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
- [ ] Any interfaces changed?
- [ ] Any backward compatibility impacted?
- [ ] Document update required?
- [x] Testing done
added some testcases
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/shardul-cr7/carbondata b010
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2847.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2847
----
commit 6ad88ccc5663353d16372d91878d7efb223b16d6
Author: shardul-cr7 <shardulsingh22@...>
Date: 2018-10-23T11:57:47Z
[WIP]Support Gzip
----
---