In a message dated 2/20/2006 4:23:22 P.M. Eastern Standard Time, [EMAIL PROTECTED] writes:
 
> my %h;
> print scalar(%h), "\n";
>
> $h{a} = 1;
> print scalar(%h), "\n";
>
> keys %h = 256;
> print scalar(%h), "\n";
>
> $h{$_}=$_ for 'a'..'zzz';
> print scalar(%h), "\n";
>
> and it prints out:
>
> 0
> 1/8
> 1/256
> 14056/32768
>
> So what do those numbers mean? I thought I had it figured out until the
> last one. That would've made sense, too, if it'd been 18278, the number
> of strings between 'a' and 'zzz'. That would explain the 32768--the next
> 2**x after 18278. But where'd the 14056 come from?
>
> Come to think of it, shouldn't the second line of output been 1/2? Or is
> eight the minimum number of buckets you get for a hash, whether you use
> 'em or not?
>
> Thanks to anyone who feels like answering.
>
> Deane
 
deane --  
 
i can't point to any code to prove it, but here's my understanding of how hash access
works:  
 
    when accessing the value of a hash key, the key is used to generate a hash index;
 
    this hash index is an index to a particular ``bin'' of the hash array;
 
    however, there is no guarantee that each unique key will generate a unique hash bin
    index, so each bin is the head of a linked list of the keys ``stored'' in that bin, all of
    which generate the same hash bin index;
 
    after a bin index is calculated, the (hopefully short) list of keys mapped to that bin
    is searched for the particular key being accessed.  
 
the difference between the number of keys in the hash (18278) and the number
of bins actually used (14056) is the result of calculated hash bin index collisions.  
 
from the perldata doc:  
 
``If you evaluate a hash in scalar context, it returns false if the hash is empty. If there are any key/value pairs, it returns true; more precisely, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated by a slash. This is pretty much useful only to find out whether Perl's internal hashing algorithm is performing poorly on your data set. For example, you stick 10,000 things in a hash, but evaluating %HASH in scalar context reveals "1/16", which means only one out of sixteen buckets has been touched, and presumably contains all 10,000 of your items. This isn't supposed to happen.''
 
as to your second question, my guess is that your guess is right: 8 is the number of bins
shown by experience to a practical minimum allocation.  
 
hth -- bill walters  
 
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to