GitHub user xuchuanyin opened a pull request:
https://github.com/apache/carbondata/pull/2382
[CARBONDATA-2513][32K] Support write long string from dataframe
Support write long string from dataframe
Sample for usage:
```
longStringDF.write
.format("carbondata")
.option("tableName", longStringTable)
.option("single_pass", "false")
.option("sort_columns", "name")
.option("long_string_columns", "description, note")
.mode(SaveMode.Overwrite)
.save()
```
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
- [x] Any interfaces changed?
`NO`
- [ ] Any backward compatibility impacted?
`NO`
- [ ] Document update required?
`NO`
- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests
are required?
`Tests added`
- How it is tested? Please attach test report.
`Tested in local machine`
- Is it a performance related change? Please attach the performance
test report.
- Any additional information to help reviewers in testing this
change.
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xuchuanyin/carbondata
0620_long_string_dataframe
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2382.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2382
----
commit b689d66493521452ff9938415e0d0aa66b56c2c5
Author: xuchuanyin <xuchuanyin@...>
Date: 2018-06-02T07:17:04Z
Support string longer than 32000 characters
Add a table property 'long_string_columns' in create table DDL that
indicate those columns will contain more than 32000 characters.
Internally in Carbondata,
1. add a new datatype called `text` to represent the long string column
2. add a new encoding called `DIRECT_COMPRESS_TEXT` to the text column
page meta
3. Use an integer (previously short) to store the length of bytes
content.
commit f145c6c60238c400b5db6a6bf2696246b698154a
Author: xuchuanyin <xuchuanyin@...>
Date: 2018-06-05T12:46:26Z
rename datatype name from text to varchar
commit 4180f8118d1ff90205b0f1567bef2cdfee3a1b62
Author: xuchuanyin <xuchuanyin@...>
Date: 2018-06-12T12:35:58Z
Add 2GB constraint for one column page
commit 710845b155ed5b7a611a900c70b0d766d80ae48d
Author: xuchuanyin <xuchuanyin@...>
Date: 2018-06-14T12:11:40Z
update tests
commit 74106d2793ed97615a439576b1c16d34bfaa3ab7
Author: xuchuanyin <xuchuanyin@...>
Date: 2018-06-19T07:49:57Z
support write long string from dataframe
----
---