GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/2382

    [CARBONDATA-2513][32K] Support write long string from dataframe

    Support write long string from dataframe
    
    Sample for usage:
    ```
    longStringDF.write
      .format("carbondata")
      .option("tableName", longStringTable)
      .option("single_pass", "false")
      .option("sort_columns", "name")
      .option("long_string_columns", "description, note")
      .mode(SaveMode.Overwrite)
      .save()
    ```
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [x] Any interfaces changed?
     `NO`
     - [ ] Any backward compatibility impacted?
      `NO`
     - [ ] Document update required?
     `NO`
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests 
are required?
    `Tests added`
            - How it is tested? Please attach test report.
    `Tested in local machine`
            - Is it a performance related change? Please attach the performance 
test report.
            - Any additional information to help reviewers in testing this 
change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata 
0620_long_string_dataframe

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2382.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2382
    
----
commit b689d66493521452ff9938415e0d0aa66b56c2c5
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-06-02T07:17:04Z

    Support string longer than 32000 characters
    
    Add a table property 'long_string_columns' in create table DDL that
    indicate those columns will contain more than 32000 characters.
    
    Internally in Carbondata,
    1. add a new datatype called `text` to represent the long string column
    2. add a new encoding called `DIRECT_COMPRESS_TEXT` to the text column
    page meta
    3. Use an integer (previously short) to store the length of bytes
    content.

commit f145c6c60238c400b5db6a6bf2696246b698154a
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-06-05T12:46:26Z

    rename datatype name from text to varchar

commit 4180f8118d1ff90205b0f1567bef2cdfee3a1b62
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-06-12T12:35:58Z

    Add 2GB constraint for one column page

commit 710845b155ed5b7a611a900c70b0d766d80ae48d
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-06-14T12:11:40Z

    update tests

commit 74106d2793ed97615a439576b1c16d34bfaa3ab7
Author: xuchuanyin <xuchuanyin@...>
Date:   2018-06-19T07:49:57Z

    support write long string from dataframe

----


---

Reply via email to