Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

Liang Chen Wed, 23 Nov 2016 22:43:25 -0800

Hi xiaoqiao

For the below example, 600K dictionary data:
It is to say that using "DAT" can save 36M memory against
"ConcurrentHashMap", whereas the performance just lost less (1718ms) ?


One more question:if increases the dictionary data size, what's the
comparison results "ConcurrentHashMap" VS "DAT"

Regards
Liang
------------------------------------------------------------------------------------------------------
a. memory footprint (approximate quantity) in 64-bit JVM: 
~104MB (*ConcurrentHashMap*) vs ~68MB (*DAT*) 

b. retrieval performance: total time(ms) of 500 million query: 
12825 ms(*ConcurrentHashMap*) vs 14543 ms(*DAT*) 

Regards
Liang

hexiaoqiao wrote
> hi Liang,
> 
> Thanks for your reply, i need to correct the experiment result because
> it's
> wrong order NO.1 column of result data table.
> 
> In order to compare performance between Trie and HashMap, Two different
> structures are constructed using the same dictionary data which size is
> 600K and each item's length is between 2 and 50 bytes.
> 
> ConcurrentHashMap (structure which is used in CarbonData currently) vs
> Double
> Array Trie (one implementation of Trie Structures)
> 
> a. memory footprint (approximate quantity) in 64-bit JVM:
> ~104MB (*ConcurrentHashMap*) vs ~68MB (*DAT*)
> 
> b. retrieval performance: total time(ms) of 500 million query:
> 12825 ms(*ConcurrentHashMap*) vs 14543 ms(*DAT*)
> 
> Regards,
> He Xiaoqiao
> 
> 
> On Thu, Nov 24, 2016 at 7:48 AM, Liang Chen &lt;

> chenliang6136@

> &gt; wrote:
> 
>> Hi xiaoqiao
>>
>> This improvement looks great!
>> Can you please explain the below data, what does it mean?
>> ----------
>> ConcurrentHashMap
>> ~68MB 14543
>> Double Array Trie
>> ~104MB 12825
>>
>> Regards
>> Liang
>>
>> 2016-11-24 2:04 GMT+08:00 Xiaoqiao He &lt;

> xq.he2009@

> &gt;:
>>
>> >  Hi All,
>> >
>> > I would like to propose Dictionary improvement which using Trie in
>> place
>> of
>> > HashMap.
>> >
>> > In order to speedup aggregation, reduce run-time memory footprint,
>> enable
>> > fast
>> > distinct count etc, CarbonData encodes data using dictionary at file
>> level
>> > or table level based on cardinality. It is a general and efficient way
>> in
>> > many big data systems, but when apply ConcurrentHashMap
>> > to maintain Dictionary in CarbonData currently, memory overhead of
>> > Driver is very huge since it has to load whole Dictionary to decode
>> actual
>> > data value, especially column cardinality is a large number. and
>> CarbonData
>> > will not do dictionary if cardinality > 1 million at default behavior.
>> >
>> > I propose using Trie in place of HashMap for the following three
>> reasons:
>> > (1) Trie is a proper structure for Dictionary,
>> > (2) Reduce memory footprint,
>> > (3) Not impact retrieval performance
>> >
>> > The experimental results show that Trie is able to meet the
>> requirement.
>> > a. ConcurrentHashMap vs Double Array Trie
>> > &lt;https://linux.thai.net/~thep/datrie/datrie.html&gt;(one
>> implementation of
>> > Trie Structures)
>> > b. Dictionary size: 600K
>> > c. Memory footprint and query time
>> > - memory footprint (64-bit JVM) 500 million query time(ms)
>> > ConcurrentHashMap
>> > ~68MB 14543
>> > Double Array Trie
>> > ~104MB 12825
>> >
>> > Please share your suggestions about the proposed improvement of
>> Dictionary.
>> >
>> > Regards
>> > He Xiaoqiao
>> >
>>
>>
>>
>> --
>> Regards
>> Liang
>>





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Improvement-Use-Trie-in-place-of-HashMap-to-reduce-memory-footprint-of-Dictionary-tp3132p3143.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

Reply via email to