Hi xiaoqiao For the below example, 600K dictionary data: It is to say that using "DAT" can save 36M memory against "ConcurrentHashMap", whereas the performance just lost less (1718ms) ?
One more question:if increases the dictionary data size, what's the comparison results "ConcurrentHashMap" VS "DAT" Regards Liang ------------------------------------------------------------------------------------------------------ a. memory footprint (approximate quantity) in 64-bit JVM: ~104MB (*ConcurrentHashMap*) vs ~68MB (*DAT*) b. retrieval performance: total time(ms) of 500 million query: 12825 ms(*ConcurrentHashMap*) vs 14543 ms(*DAT*) Regards Liang hexiaoqiao wrote > hi Liang, > > Thanks for your reply, i need to correct the experiment result because > it's > wrong order NO.1 column of result data table. > > In order to compare performance between Trie and HashMap, Two different > structures are constructed using the same dictionary data which size is > 600K and each item's length is between 2 and 50 bytes. > > ConcurrentHashMap (structure which is used in CarbonData currently) vs > Double > Array Trie (one implementation of Trie Structures) > > a. memory footprint (approximate quantity) in 64-bit JVM: > ~104MB (*ConcurrentHashMap*) vs ~68MB (*DAT*) > > b. retrieval performance: total time(ms) of 500 million query: > 12825 ms(*ConcurrentHashMap*) vs 14543 ms(*DAT*) > > Regards, > He Xiaoqiao > > > On Thu, Nov 24, 2016 at 7:48 AM, Liang Chen < > chenliang6136@ > > wrote: > >> Hi xiaoqiao >> >> This improvement looks great! >> Can you please explain the below data, what does it mean? >> ---------- >> ConcurrentHashMap >> ~68MB 14543 >> Double Array Trie >> ~104MB 12825 >> >> Regards >> Liang >> >> 2016-11-24 2:04 GMT+08:00 Xiaoqiao He < > xq.he2009@ > >: >> >> > Hi All, >> > >> > I would like to propose Dictionary improvement which using Trie in >> place >> of >> > HashMap. >> > >> > In order to speedup aggregation, reduce run-time memory footprint, >> enable >> > fast >> > distinct count etc, CarbonData encodes data using dictionary at file >> level >> > or table level based on cardinality. It is a general and efficient way >> in >> > many big data systems, but when apply ConcurrentHashMap >> > to maintain Dictionary in CarbonData currently, memory overhead of >> > Driver is very huge since it has to load whole Dictionary to decode >> actual >> > data value, especially column cardinality is a large number. and >> CarbonData >> > will not do dictionary if cardinality > 1 million at default behavior. >> > >> > I propose using Trie in place of HashMap for the following three >> reasons: >> > (1) Trie is a proper structure for Dictionary, >> > (2) Reduce memory footprint, >> > (3) Not impact retrieval performance >> > >> > The experimental results show that Trie is able to meet the >> requirement. >> > a. ConcurrentHashMap vs Double Array Trie >> > <https://linux.thai.net/~thep/datrie/datrie.html>(one >> implementation of >> > Trie Structures) >> > b. Dictionary size: 600K >> > c. Memory footprint and query time >> > - memory footprint (64-bit JVM) 500 million query time(ms) >> > ConcurrentHashMap >> > ~68MB 14543 >> > Double Array Trie >> > ~104MB 12825 >> > >> > Please share your suggestions about the proposed improvement of >> Dictionary. >> > >> > Regards >> > He Xiaoqiao >> > >> >> >> >> -- >> Regards >> Liang >> -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Improvement-Use-Trie-in-place-of-HashMap-to-reduce-memory-footprint-of-Dictionary-tp3132p3143.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
