Hi Dev Currently we are supporting LOCAL DICTIONARY feature during data load operation. The feature is very helpful in terms that it reduces the store size which helps is reducing the IO thereby enhancing the query performance. *This proposal is to extend LOCAL DICTIONARY feature and provide a separate DDL and offline support for this feature. This is will make this feature usage more flexible. The reason for proposing this feature is*:
1. DDL support which can enable stores without local dictionary to add this feature for the already loaded data. This can be helpful for customers to leverage the functionality of LOCAL DICTIONARY feature for their data which is written in carbondata format without local dictionary. 2. We know that when Local dictionary is enabled, though small but there is degrade in data load performance. So there can be applications/customers who want to fine tune the loaded data in off-peak time. This feature can be helpful for those kind of scenarios. 3. Offline support is proposed for SDK like features where In we do not have spark driver executor model and there can be only a single thread used for loading data. So for this scenario we can provide an offline support thereby not impacting the existing data load performance. Please let me know your suggestions for this proposal. If most of the community members feel the idea is good and it will make the usage of this feature more flexible I can come up with a design and further discuss on this platform. Regards Manish Gupta
