GitHub user jackylk opened a pull request:
https://github.com/apache/incubator-carbondata/pull/104
[CARBONDATA-188] Compress CSV file before loading
Currently when loading CarbonData file using Spark Dataframe API, it will
firstly save as CSV file then load to CarbonData file.
Sometimes CSV requires a lot of disk space, in this PR, instead of saving
as CSV text file, it will save a compressed CSV file, then load to CarbonData.
In my laptop, when loading 1 million records, the disk space required for
CSV file is reduced 4~5 times.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jackylk/incubator-carbondata compress
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-carbondata/pull/104.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #104
----
commit ddeaecb9dad1b51be85302d0ff7ee9c31c1b13d7
Author: jackylk <[email protected]>
Date: 2016-08-29T08:41:38Z
compress CSV file using GZIP while loading
commit 1bfc8c3bcb9a3809580386c16b5fe94b2c6b6943
Author: jackylk <[email protected]>
Date: 2016-08-29T09:05:17Z
fix checkstyle
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---