RE: encapsulated hash codes

Allen Wirfs-Brock Wed, 04 Nov 2009 18:43:43 -0800

 See below

From: Maciej Stachowiak [mailto:[email protected]]
Sent: Wednesday, November 04, 2009 4:50 PM
To: Allen Wirfs-Brock
Cc: es-discuss
Subject: Re: encapsulated hash codes



On Nov 4, 2009, at 2:49 PM, Allen Wirfs-Brock wrote:


A straw man proposal for  Harmony encapsulated hash codes  has been posted to 
the Wiki 
at.<http://wiki.ecmascript.org/doku.php?id=strawman:encapsulated_hashcodes>


1)     Wouldn't it be simpler to have a single Object.hash() function that 
returns a uniformly distributed number in the 0..2^32 range, and leave the mod 
operation up to code using the hash code? Making a new function that 
(essentially) encapsulates the hash operation and the mod does not seem to pull 
its weight in API complexity.

Yes, it would and that is the classic approach (perhaps excluding the 
uniformity of distribution)  used by most languages.  As an implementer I would 
be fairly happy with that approach.  However, some of our members have 
significant concerns about the possibility of hash codes being used as a covert 
communications canner.  This proposal tries to avoid that problem by giving 
each client  of identity hashes (most likely each hash table) their own hash 
function which yields a distinct sequences of hash values (for the same 
sequence of objects).  Since  the functions aren't shared the sequence of 
values they generate can be used as a channel.

Requiring uniform distribution over a 32-bit range may not be practical.  If a 
client only wants hashes in the 0.255 range the obvious function of a 32-bit 
value (mod 256) ignores most of the bits that actually distinguish the values.

The individual functions probably do a bit more than a simple mod.  Typically, 
the seed that is used to generate per object hash values  (for example, the 
allocation address of an object) has fewer than 32 variable bits (for example, 
only the size range of the allocation nursery).  If the hash function know 
range of hash values it is expected to generate it can more effectively use the 
actual variable bits to generate a good distribution.

2) What about non-objects? It would be nice if strings, numbers, and perhaps 
other primitive types could hash by value rather than by identity, so you could 
reasonably mix them as hash keys with objects.

Sure this can be added, but isn't an essential primitive. Hash values can be 
easily  generated today for all the built-in value types.  But object identity 
hash requires some sort of primitive support.


2)     Security considerations: it should be recommended that the hash code, if 
computed from the address of the object, is not reversible. (Of course, with a 
copying collector you probably can't compute the hash code from the address).
Yes, that would be a good requirement.   In practice, it's usually the case 
anyway.  Actually, the address is a good seed hash value even for copying 
collectors.  There are tricks that are known by lisp/smalltalk/etc. 
implementers for efficiently doing this.  Happy to share...  Ideally, identity 
hashes cost nothing to compute and require no additional per object storage.  
The ideal isn't obtainable, but you can get pretty close.

Regards,
Maciej

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

RE: encapsulated hash codes

Reply via email to