Hi,

>Umm, the reason that the hash is slow is that it only hashes string type
>values. The test you were using had numeric values. Try your tests with
>strings and you should see a significant difference and I will add numeric
>values to the hash as well for the next release... However, I'm not
>planning on hashing block values...
>
>  - jim

Well, that clears some things up!

Still, the big thing missing for me in Rebol is the lack of a really
fast associative array. With Thompson AWK, the language I most use for data
crunching, I can make an associative array with 100,000 keys in about
four seconds, and search for 100,000 key values in five seconds. With Rebol
it takes longer even with much smaller hashes. For example using a hash with
just 10,000 key values (with /Core 2.3.0.3.1):

  z: make hash! 20000
  loop 10000 [
    x: random 2000000
    if not select z x [
      append z reduce [to string! x x]
    ]
  ]

takes over a minute, and

  loop 10000 [
    if select z to string! random 2000000 [x: x + 1]
  ]

takes eight or nine seconds.

Hashes seem to be particularly slow when searching for nonexistent key
values. For 100,000 searches:

  loop 100000 [ if select z "1533695" [x: x + 1]] ; first key value,   0:02
  loop 100000 [ if select z "501730" [x: x + 1]]  ; 5000th key value,  0:53
  loop 100000 [ if select z "1533696" [x: x + 1]] ; non-key value,     1:41

I figure the main difference is that Rebol remembers the order the
elements were saved in the hash, whereas with AWK that information is
discarded. Also, AWK allows a key value to be used only once whereas
with Rebol hashes you can have any number of identical values.

I figure the main reason for using a hash is to associate unique key
values with one other value. Trying to give hashes in Rebol all the
properties of a block introduces a lot of unnecessary overhead, and
means a lot more hand-coding to check for pre-existing key values.

Another problem with Rebol hashes is if you don't declare the size of the
hash to begin with, it crashes.

It would be really really nice to have a fast native data structure that
worked more like associative arrays in AWK or Perl!

Thanks,
Eric

Reply via email to