Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/2252
@xuchuanyin For supporting String column with more than 32K character we
need below changes.
**Create**
1. Support new data type varchar already mentioned by @ravipesala
**Loading:**
1. Add new encoder . Add this encoder for all the varchar columns in
datachunk2 while writing the data to carbondata file. Please check DataChunk2
in carbondata.thrift we are adding encoder for each column
2. Use DirectCompressCodec for compressing the data already code is present
in ColumnPage.getLVFlattenedBytePage()
3. Add stats collector for computing max/min for varchar columns implement
new class for handling the same
4. No need to add sartkey and endkey for varchar columns,
**Reading**
1. Add new implementation for DimensionDataChunkStore to store INT LV
format data.(already handled)
2. Based on encoder present in datachunk2 use DimensionDataChunkStore
implementation. Like for dictionary encoder we are creating
fixedLengthStoreChunk object
3. For varchar column just uncompress the data and store the same data with
LV format in store (No need to convert LV formatted data to 2D byte array)
**Note:**: We need to handle the same changes for complex data type.
Please take care of backward compatibility :-)
Please let me know for any clarification.
@ravipesala @jackylk . Please check if I missed anything.
---