[GitHub] carbondata issue #2252: [CARBONDATA-2420] Support string longer than 32000 c...

kumarvishal09 Wed, 23 May 2018 06:53:34 -0700

Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2252
  
    @xuchuanyin For supporting String column with more than 32K character we 
need below changes.
    **Create**
    1. Support new data type varchar already mentioned by @ravipesala 
    **Loading:**
    1. Add new encoder . Add this encoder for all the varchar columns in 
datachunk2 while writing the data to carbondata file. Please check DataChunk2 
in carbondata.thrift we are adding encoder for each column
    2. Use DirectCompressCodec for compressing the data already code is present 
in ColumnPage.getLVFlattenedBytePage()
    3. Add stats collector for computing max/min for varchar columns implement 
new class for handling the same
    4. No need to add sartkey and endkey for varchar columns,
    **Reading**
    1. Add new implementation for DimensionDataChunkStore to store INT LV 
format data.(already handled)
    2. Based on encoder present in datachunk2 use DimensionDataChunkStore 
implementation. Like for dictionary encoder we are creating 
fixedLengthStoreChunk object 
    3. For varchar column just uncompress the data and store the same data with 
LV format in store (No need to convert LV formatted data to 2D byte array)
    
    **Note:**: We need to handle the same changes for complex data type.
    Please take care of backward compatibility :-)
    Please let me know for any clarification.
     
    @ravipesala @jackylk . Please check if I missed anything.

---

[GitHub] carbondata issue #2252: [CARBONDATA-2420] Support string longer than 32000 c...

Reply via email to