Re: BackingStoreHashtable

Jack Klebanoff 27 Jan 2005 18:29:00 -0000

My mail server dropped RPost's reply so I had to dig it out of the archive. I apologize for the delay and improper threading.

RPost wrote:

Does either 'spill to disk' approach have any other possible future use?
Perhaps for implementing any other features or functionality on anyone's
'nice to have' list? Or would either btree or hash implementations be useful
for only this one purpose?

BackingStoreHashtable could be used to implement other operations such as INTERSECT, and EXCEPT, and in removing duplicates. I originally noticed that BackingStoreHashtable did not spill to disk when researching the implementation of INTERSECT.

A dynamic disk hash table could be used for indexes. However BackingStoreHashtables are transient, single thread, data structures. A hash index must deal with multi-thread issues while BackingStoreHashtable would not want to spend time on locking. Furthermore BackingStoreHashtables only grow until they are discarded entirely. A hash index must deal with shrinking. So, with respect to BackingStoreHashtable and hash indexes, I would say close, but no cigar.

Jack

Jack Klebanoff wrote:
Derby uses class BackingStoreHashtable to implement a hash table used in hash joins. Despite the class's name it does not spill to disk; the hash table is always in memory.

The optimizer tries to avoid hash joins when the input is large because BackingStoreHashtable would use too much memory. The optimizer just works with row count estimates, so there is no guarantee that BackingStoreHashtable will not blow up. No customer has complained about this yet.

I would like to work on this, changing BackingStoreHashtable to spill to disk when the hash table gets large, and changing the optimizer to understand that large hash tables are allowed, but more costly.

The advantages of doing this are that 1. hash joins will not blow up when the compiler's row count estimate was low, 2. the optmizer can choose hash joins on large inputs when that is the least costly join implementation, and 3. we can use BackingStoreHashtable to implement features such as INTERSECT and GROUP BY, though I am not proposing to do so now.

I am not proposing to implement a hash table that can be used as an alternative to Btrees for a secondary index. BackingStoreHashtable is used for transient data structures that are only accessed by a single thread. A secondary index implementation must deal with locking and must implement hashCode methods that are JVM independent. This is much more work and would yield a slower implementation.

I propose that BackingStoreHashtable should start off using an in-memory hash table even if the estimated row count is large. That way it will use an in-memory hash table when the actual row count is small enough for the table to fit in memory. When it finds that spilling to disk is necessary BackingStoreHashtable will use the estimated row count to determine the initial number of buckets and move the in-memory entries to disk. The disk based hash table will use a linear hashing algorithm, see "External Memory Algorithms and Data Structures: Dealing withMassive Data", Jeffrey Scott Vitter, ACM Computing Surveys, Vol. 33, No. 2, June 2001, pp. 209–271. It grows the hash table one bucket at a time when the average number of entries per bucket grows beyond a threshold. The disadvantage of growing by a larger number of buckets is that the last expansion may be unnecessarity large, wasting time and space.

I would appreciate it if anyone can point me to a better external hash table algorithm.

The disk hash table will use overflow pages because an imperfect hash function will cause some buckets to get more than their share of entries, and because we may have duplicate keys.

I have not yet looked at how the optimizer handles hash joins, so I do not yet have a proposal for how to change that.

Can anyone explain how to generate a cost for the Derby optimizer? How do you compute it from an estimated number of disk accesses? Does an estimate of 2 disk accesses mean a cost of 2.0? Should the cost estimate include CPU time? If so, how do you estimate it relative to disk access cost? Putting it another way, what does a cost estimate of 1.0 (or x) mean?
Comments, criticisms, questions?
Jack Klebanoff

Re: BackingStoreHashtable

Reply via email to