On Tue, 2004-12-28 at 13:46, Ian Langworth wrote:
> On 28.Dec.2004 01:14AM -0500, Tom Metro wrote:
> 
> > If you are concerned about the performance impact of long
> > keys, and your application fits a "write-once, read-many"
> > model, then you could always hash the hash keys. Say generate
> > an MD5 digest of the key string, and then use the digest as
> > the hash key.
> 
> This might make a nice Tie:: module, if there already isn't
> one. But then again, tie itself is allegedly slow...

No, that would defeat the point....

Or at least that's what I was going to say... I had a whole rationale
typed up, but then I went to benchmark my hypothesis and I get this:

        $ perl -MBenchmark -e 'my $long="a"x10_000;my %x;timethis(100_000,sub 
{$x{$long}++});print "Final: $x{$long}\n"'
        timethis 100000:  7 wallclock secs ( 6.54 usr +  0.00 sys =  6.54 CPU) 
@ 15290.52/s (n=100000)
        Final: 100000
        $ perl -MBenchmark -e 'my $long="a"x10_000;my %x;timethis(100_000,sub 
{my $tmp=unpack("%32C*",$long) % 65535;$x{$tmp}++});my 
$tmp=unpack("%32C*",$long) % 65535;print "Final: $x{$tmp}\n"'
        timethis 100000:  2 wallclock secs ( 2.16 usr +  0.00 sys =  2.16 CPU) 
@ 46296.30/s (n=100000)
        Final: 100000

Is there a bug in my code, or is there really that substantial a
savings?

Of course, there's a substantial problem with the above: hashes DO
conflict. Your module would have to do the same conflict resolution that
perl's built-in hashing would do, and that's probably where the extra
overhead comes in (though I admit I'm not seeing it... perhaps in
comparing the long value to the original?)

In a case where collisions wouldn't be a real problem, I guess that's a
non-issue, but those are rare cases.

-- 
â 781-324-3772
â [EMAIL PROTECTED]
â http://www.ajs.com/~ajs

_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to