Re: Why is there a separate lookup HashMap in DataTree?

Andrew Finnell Thu, 08 Sep 2011 09:19:44 -0700

Thomas,

I did not introduce this feature or looked at the code. But from a Hashing 
standpoint, average-case of a properly implemented Hash map should be O(1). If 
HashMap<FullPath, Node> is statistically slower than HashMap<name, Node> it 
means the hasing algorithm used for the type FullPath is inefficient, causing 
inefficiency. The solution would be to analyze FullPath to see how hashCode() 
is implemented. In a perfect hashing scenario there are no collisions which 
means it becomes a simple index lookup. When hashes start colliding it requires 
an extra steps in a bucket to determine the correct object to return, through 
equals(). You can imagine this is horribly inefficient.


Run your tests again with a modified FullPath that returns a hashCode() that is 
simple to compute hash that is cached until data is changed and a equals() 
method to appropriately compute equality. 

Andrew

On Sep 8, 2011, at 11:35 AM, Thomas Koch wrote:

> Hi,
> 
> I'm studying the ZK DataTree and wondered, whether the code overhead of a 
> lookup HashMap for the nodes provides the expected speed benefit.
> 
> Therefor I found japex[1], a microbenchmarking tool and wrote two different 
> "tree" implementations: A simple "HashMap<FullPath, Node>" and a real tree 
> where each Node contains "HashMap<name, Node> children".
> 
> I ran benchmarks to lookup random nodes with trees of different depth ("l" = 
> levels) and children count for each node. (see attachment, 
> tps=transactions/sec) The benchmark was executed single threaded on my 
> laptop. 
> I don't dare to say, that it's accurate, but it may be good enough to make 
> one 
> wonder.
> 
> So the HashMap based lookup seems to be faster for all configurations, but at 
> most twice as fast. On the other hand the code is made more complicated, the 
> memory consumption of every path in the DataTree is doubled and the lookup 
> time may hardly be a bottleneck at all in a network application.
> 
> I scanned the VCS logs of ZK to find informations when the HashMap lookup was 
> introduced and how it was motivated, but it was already there when the 
> project 
> came out of Yahoo.
> 
> Could sbd. please give more information, why the HashMap lookup has been 
> introduced and whether benchmarks were made at that time?
> 
> A slightly related topic: Does anybody run regulary benchmarks against ZK? Is 
> there a benchmark suite? Would you recommend a tool to build benchmarks for 
> ZK?
> 
> [1] http://japex.java.net
> 
> Thank you,
> 
> Thomas Koch, http://www.koch.ro

Re: Why is there a separate lookup HashMap in DataTree?

Reply via email to