Hi xiaoqiao

This improvement looks great!
Can you please explain the below data, what does it mean?
----------
ConcurrentHashMap
~68MB 14543
Double Array Trie
~104MB 12825

Regards
Liang

2016-11-24 2:04 GMT+08:00 Xiaoqiao He <xq.he2...@gmail.com>:

>  Hi All,
>
> I would like to propose Dictionary improvement which using Trie in place of
> HashMap.
>
> In order to speedup aggregation, reduce run-time memory footprint, enable
> fast
> distinct count etc, CarbonData encodes data using dictionary at file level
> or table level based on cardinality. It is a general and efficient way in
> many big data systems, but when apply ConcurrentHashMap
> to maintain Dictionary in CarbonData currently, memory overhead of
> Driver is very huge since it has to load whole Dictionary to decode actual
> data value, especially column cardinality is a large number. and CarbonData
> will not do dictionary if cardinality > 1 million at default behavior.
>
> I propose using Trie in place of HashMap for the following three reasons:
> (1) Trie is a proper structure for Dictionary,
> (2) Reduce memory footprint,
> (3) Not impact retrieval performance
>
> The experimental results show that Trie is able to meet the requirement.
> a. ConcurrentHashMap vs Double Array Trie
> <https://linux.thai.net/~thep/datrie/datrie.html>(one implementation of
> Trie Structures)
> b. Dictionary size: 600K
> c. Memory footprint and query time
> - memory footprint (64-bit JVM) 500 million query time(ms)
> ConcurrentHashMap
> ~68MB 14543
> Double Array Trie
> ~104MB 12825
>
> Please share your suggestions about the proposed improvement of Dictionary.
>
> Regards
> He Xiaoqiao
>



-- 
Regards
Liang

Reply via email to