As I wrote, there is the issue of dynamically balancing the tree as random insertions come in.
On Tue, 27 Jul 2010 15:43:08 -0500 Hal Merritt <hmerr...@jackhenry.com> wrote: :>Am I missing something? Why not use a binary chop search? That is, each compare cuts the universe in half. For a million entries, you would need 20 or less compares. :> :>http://en.wikipedia.org/wiki/Binary_search_algorithm :> :> :>-----Original Message----- :>From: IBM Mainframe Discussion List [mailto:ibm-m...@bama.ua.edu] On Behalf Of Binyamin Dissen :>Sent: Tuesday, July 27, 2010 3:14 PM :>To: IBM-MAIN@bama.ua.edu :>Subject: Re: Hashing algorythm for text strings without embedded blanks :> :>On Tue, 27 Jul 2010 15:43:47 -0400 zMan <zedgarhoo...@gmail.com> wrote: :> :>:>OK, I have a splitting headache and may be dense anyway: how is it easier to :>:>look it up once it's hashed? What am I missing? :> :>Not easier, faster. :> :>Assume 100000 strings. A serial search will take on average 50000 compares. A :>mod64 hash should reduce it on average to under 800 compares, hash to slot and :>run the alias chain. :> :>A b-tree would be better but then there is the issue of balancing it while :>others are busy transversing it. :> :>:>On Tue, Jul 27, 2010 at 3:36 PM, Binyamin Dissen <bdis...@dissensoftware.com :>:>> wrote: :> :>:>> On Tue, 27 Jul 2010 15:32:24 -0400 zMan <zedgarhoo...@gmail.com> wrote: :> :>:>> :>What's the goal of the hashing -- obfuscation? Shortening the data? :> :>:>> Lookup. :> :>:>> Does the entry exist, and if so what are its attributes. If does not exist, :>:>> add. :> :>:>> :>On Tue, Jul 27, 2010 at 3:18 PM, Binyamin Dissen < :>:>> bdis...@dissensoftware.com :>:>> :>> wrote: :> :>:>> :>> Is there a preferred hashing algorithm for such strings? The strings :>:>> will :>:>> :>> be :>:>> :>> fixed length with trailing blanks. :> :>:>> :>> I was thinking of simply doing a MOD64 (or higher) of the sum of the :>:>> :>> non-blank :>:>> :>> part of the string by words, but as this is text with a limited :>:>> character :>:>> :>> set, :>:>> :>> would this lead to excessive alias chains? Has research been done in :>:>> this :>:>> :>> area? There will mostly be search activity but also add activity. -- Binyamin Dissen <bdis...@dissensoftware.com> http://www.dissensoftware.com Director, Dissen Software, Bar & Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html