|
In a message dated 2/20/2006 4:23:22 P.M. Eastern Standard Time,
[EMAIL PROTECTED] writes:
> my %h;
> print scalar(%h), "\n"; > > $h{a} = 1; > print scalar(%h), "\n"; > > keys %h = 256; > print scalar(%h), "\n"; > > $h{$_}=$_ for 'a'..'zzz'; > print scalar(%h), "\n"; > > and it prints out: > > 0 > 1/8 > 1/256 > 14056/32768 > > So what do those numbers mean? I thought I had it figured out until the > last one. That would've made sense, too, if it'd been 18278, the number > of strings between 'a' and 'zzz'. That would explain the 32768--the next > 2**x after 18278. But where'd the 14056 come from? > > Come to think of it, shouldn't the second line of output been 1/2? Or is > eight the minimum number of buckets you get for a hash, whether you use > 'em or not? > > Thanks to anyone who feels like answering. > > Deane deane --
i can't point to any code to prove it, but here's my understanding of how
hash access
works:
when accessing the value of a hash key, the key is
used to generate a hash index;
this hash index is an index to a particular ``bin'' of
the hash array;
however, there is no guarantee that each unique key will
generate a unique hash bin
index, so each bin is the head of a linked list of the
keys ``stored'' in that bin, all of
which generate the same hash bin index;
after a bin index is calculated, the (hopefully short)
list of keys mapped to that bin
is searched for the particular key being
accessed.
the difference between the number of keys in the hash (18278) and the
number
of bins actually used (14056) is the result of calculated hash bin index
collisions.
from the perldata doc:
``If you evaluate a hash in scalar context, it returns false if the hash is
empty. If there are any key/value pairs, it returns true; more precisely, the
value returned is a string consisting of the number of used buckets and the
number of allocated buckets, separated by a slash. This is pretty much useful
only to find out whether Perl's internal hashing algorithm is performing poorly
on your data set. For example, you stick 10,000 things in a hash, but evaluating
%HASH in scalar context reveals
"1/16", which means only one out of
sixteen buckets has been touched, and presumably contains all 10,000 of your
items. This isn't supposed to happen.''as to your second question, my guess is that your guess is right: 8 is the
number of bins
shown by experience to a practical minimum allocation.
hth -- bill walters
|
_______________________________________________ ActivePerl mailing list [email protected] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
