+1 
Good feature to add in CarbonData

Regards,
Jacky


> 在 2018年6月4日,下午11:10,Kumar Vishal <kumarvishal1...@gmail.com> 写道:
> 
> Hi Community,Currently CarbonData supports global dictionary or
> No-Dictionary (Plain-Text stored in LV format) for storing dimension column
> data.
> 
> *Bottleneck with Global Dictionary*
> 
>   1.
> 
>   As dictionary file is mutable file, so it is not possible to support
>   global dictionary in storage environment which does not support append.
>   2.
> 
>   It’s difficult for user to determine whether the column should be
>   dictionary or not if number of columns in table is high.
>   3.
> 
>   Global dictionary generation generally slows down the load process
> 
> *Bottleneck with No-Dictionary*
> 
>   1.
> 
>   Storage size is high
>   2.
> 
>   Query on No-Dictionary column is slower as data read/processed is more
>   3.
> 
>   Filtering is slower on No-Dictionary columns as number of comparison is
>   high
>   4.
> 
>   Memory footprint is high
> 
> The above bottlenecks can be solved by *Generating Local dictionary for low
> cardinality columns at each blocklet level, *which will help to achieve
> below benefits:
> 
>   1.
> 
>   This will help in supporting dictionary generation on different storage
>   environment irrespective of its supported operations(append) on the files.
>   2.
> 
>   Reduces the extra IO operations read/write on the dictionary files
>   generated in case of global dictionary.
>   3.
> 
>   It will eliminate the problem for user to identify the dictionary
>   columns when the number of columns are more in a table.
>   4.
> 
>   It helps in getting more compression on dimension columns with less
>   cardinality.
>   5.
> 
>   Filter query on No-dictionary columns with local dictionary will be
>   faster as filter will be done on encoded data.
>   6.
> 
>   It will help in reducing the store size and memory footprint as only
>   unique values will be stored as part of local dictionary and
>   corresponding data will be stored as encoded data.
> 
> Please provide your comment. Any suggestion from community is most
> welcomed. Please let me know for any clarification.
> 
> -Regards
> Kumar Vishal



Reply via email to