Re: [lucy-dev] Clownfish Hash API

Nick Wellnhofer Mon, 20 Apr 2015 03:43:37 -0700

On 20/04/2015 06:36, Marvin Humphrey wrote:

I missed my chance to work on those because of a busy week at ApacheCon
Austin -- but now Lucy will need to be adapted for these changes, and I'm
happy to pick up that task.


Lucy is already fixed:

    https://github.com/apache/lucy/commits/master

Other techniques include Python throwing KeyErrors (blech), setting a non-null
default value (Python defaultdict, Ruby Hash#default=) on the object, or
adding an extra parameter to Fetch().

     Obj *got = Hash_Fetch(hash, key, default_value);
     if (got != default_value) {
         // ...
     }


C# has a TryGetValue method:

    public bool TryGetValue(TKey key, out TValue value)

This would map nicely to C but not to other languages. But if we keep mappingarrays, hashes, and strings to host language objects like we do inXSBind_cfish_to_perl, the API should be only visible from C anyway [1].

I still like `Has_Key` the best.

+1

Using unsigned 32-bit values for hash size and capacity is OK in my
opinion.


Is that best?  Java has had problems because of array indexes being 32-bit.
Will a 32-bit integer always be large enough?  Will we want to use a size_t or
uint64_t for Clownfish's String, and then use that same type elsewhere for
consistency?

Since we only support arrays of objects (pointers), uint32_t indices allow for32 GB arrays and 96 GB hash tables on 64-bit systems. On top of that, you'lltypically need much more memory to store the objects themselves. So I don'tsee an immediate need for 64-bit indices. With strings, the limit of 32-bitindices can be hit much earlier in practice.

But it shouldn't be hard to implement 64-bit indices for all container types.Maybe we should just go for it.

I think the reasoning in the docs for Python's `copy` module is compelling:

     https://docs.python.org/3/library/copy.html

     Two problems often exist with deep copy operations that don’t exist with
     shallow copy operations:

     *   Recursive objects (compound objects that, directly or indirectly,
         contain a reference to themselves) may cause a recursive loop.
     *   Because deep copy copies everything it may copy too much, e.g.,
         administrative data structures that should be shared even between
         copies.

That would imply making Clone() shallow, and implementing deep copying as
Deep_Clone().


Maybe we shouldn't offer deep cloning at all?

Nick


[1] Here are a couple of related things I always wondered about:

Shouldn't we hide String, Array, and Hash completely from Perl? For the mostpart, they already are invisible. Method return values and callback argumentswill always be converted. Only the constructor returns a Clownfish object.

Why don't we implement more To_Host methods instead of the huge if/else chainin XSBind_cfish_to_perl?

Re: [lucy-dev] Clownfish Hash API

Reply via email to