GitHub user jackylk opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/104

    [CARBONDATA-188] Compress CSV file before loading

    Currently when loading CarbonData file using Spark Dataframe API, it will 
firstly save as CSV file then load to CarbonData file. 
    
    Sometimes CSV requires a lot of disk space,  in this PR, instead of saving 
as CSV text file, it will save a compressed CSV file, then load to CarbonData. 
    
    In my laptop, when loading 1 million records, the disk space required for 
CSV file is reduced 4~5 times.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/incubator-carbondata compress

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/104.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #104
    
----
commit ddeaecb9dad1b51be85302d0ff7ee9c31c1b13d7
Author: jackylk <[email protected]>
Date:   2016-08-29T08:41:38Z

    compress CSV file using GZIP while loading

commit 1bfc8c3bcb9a3809580386c16b5fe94b2c6b6943
Author: jackylk <[email protected]>
Date:   2016-08-29T09:05:17Z

    fix checkstyle

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to