We recently had to solve that problem -- again.  Our table was 18 million 
entries.

In the past I used simple hashing:

adding the bytes of the "key" together and using the rightmost byte as 
index to a table of 256 string heads.

the most recent occasion had worst case scenario data -- only 10 hashes 
were formed.  The quick fix was to add the 1st, 3rd, 5th, ... bytes to the 
accumulator, but add the 2nd, 4th, 6th, ... bytes shifted left by 4 bits. 
Got the worst case up to 100+ hashes which seems to get us out of the 
woods.

If there is any chance the table might get paged (we did when I first 
worked on this here), allocate new entries a page at a time.  Make the 
free element table also have 256 entries.  This gives storage isolation 
for another performance boost.

We are still looking at the most recent worst case (it is part of a 100# 
in a 5# bag problem).  When I have time I am going to increase the table 
to 1024 enties (one page) to see if there is any benefit.  If it seems 
there is I will use a three step hash w/ 3 bit offset.

Note our emergency solution was to partition the whole problem into 9 
segments and then get each segment into 6 parallel processing runs.  We 
are trying to get a batch year-end process to finish before the end of the 
1st quarter.

IBM Mainframe Discussion List <[email protected]> wrote on 01/10/2008 
09:38:42 PM:

> Has anyone every seen any doc on using radix partition trees? I'm 
> thinking it may have been one of the "rainbow" books.

> I vaguely remember data tree structures and I've got a table search 
> problem that might be the perfect application for a tree-structured data 

> repository. The table might have up to 1,000,000 entries, all in 
> storage, and a balanced n-ary tree has GOT to be faster than using a 
> binary search. The nature of the data is such that a plain old-fashioned 

> list, in sorted order, isn't real amenable to a binary search, either.



-----------------------------------------
The information contained in this communication (including any
attachments hereto) is confidential and is intended solely for the
personal and confidential use of the individual or entity to whom
it is addressed. The information may also constitute a legally
privileged confidential communication. If the reader of this
message is not the intended recipient or an agent responsible for
delivering it to the intended recipient, you are hereby notified
that you have received this communication in error and that any
review, dissemination, copying, or unauthorized use of this
information, or the taking of any action in reliance on the
contents of this information is strictly prohibited. If you have
received this communication in error, please notify us immediately
by e-mail, and delete the original message. Thank you 

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to