On Mon, Sep 1, 2014 at 10:43 AM, Nick Wellnhofer <[email protected]> wrote:
> On Sep 1, 2014, at 19:27 , Marvin Humphrey <[email protected]> wrote:

>> A related bug is that we do not enforce that Hash keys must be immutable.
>>
>>    Hash    *hash = Hash_new(0);
>>    ByteBuf *foo  = BB_new_bytes("foo", 4)
>>    Hash_Store(hash, (Obj*)foo, NULL);
>>    VArray  *keys = Hash_Keys(hash);
>>    ByteBuf *key  = (ByteBuf*)VA_Fetch(keys, 0);
>>    BB_Set_Size(key, 2);      // Uh oh.
>>
>> There's a straightforward solution, though.  99% of the time, Hash uses only
>> String keys.  String is immutable.  We should consider requiring Hash keys to
>> be Strings.  There are a lot of advantages, and few downsides.
>
> Or we clone the returned keys like we clone them in Hash_Make_Key.

Early implementations of Hash supported only ByteBuf and then CharBuf keys
back when those were the primary string types.  Enabling Obj keys happened
later, and I've come to believe it was a design mistake, counter to the spirit
of Clownfish.

Clownfish is conceived as least-common-denominator classical OO, because it
has to map as cleanly as possible onto multiple host languages.  Adding
features to a class like Hash gets in the way of that, making host bindings
harder to write and to reason about.

For example, consider that since Perl hashes only support string keys, a
Clownfish Hash with only String keys can round-trip cleanly to Perl
and back but a Clownfish Hash with non-string keys cannot.  By adopting a
similar restriction on our own hash type, that's a whole class of problems we
don't have to deal with.

(I think our round-trip model ought to be JSON.  Every popular programming
language ecosystem has evolved best practices for marshalling and
unmarshalling JSON; Clownfish ought to leverage the work that has been done in
that area.)

A second advantage is that changing Hash's method signatures to accept String
keys will allow us to eliminate a lot of casting, making our code less
cluttered and safer.

Changing to a single key type also allows us to eliminate the Hash_Sum()
method from Obj -- since hash sum generation becomes an internal
implementation detail of Hash -- thus making incremental progress on shrinking
the size of our generated code and on reducing the effort required to grok
Clownfish.

And of course, using String keys solves Hash's mutable-key bug simply and
efficiently.

The downside, of course, is simply that we remove some functionality.
But at them moment, the *only* place where non-string keys appear in either
Clownfish or Lucy is in the `test_illegal_keys` function in
Lucy/Test/Util/TestJson.c.  (We used to see them in SortFieldWriter when it
used a ZombieKeyedHash for uniquing values, but no more.)

If somebody really needs a hash set or hash table which supports non-string
keys, it's possible to implement one in a Clownfish extension.  Maybe in
time we'll introduce a "standard library" of additional parcels which allows
the PMC to take on official stewardship of such extensions.  (Starting with
JSON?)  But the main Clownfish runtime ought to stay as small and spare as
possible.

Marvin Humphrey

Reply via email to