My time is rather limited but I'm slowly trying to get another SEP 200 in place.

Something that hit me, when I tried to make up my mind about whether to have (key, ptr) entries or (key, flags, ptr), is that the fast hash table entries can actually be arbitrary size. So we could make the table itself

void *table[n]

and then n would be a power of 2 (TBD: benchmark cost of allowing other sizes). Since we have the d[i] displacements, it's no problem at all to construct displacements to account for variable-size entries.

Proposal:

C-source for an un-initialized table (signature string is placeholder and not up for discussion now):

 { "3:method:foo:i4i4->i4", (void*)EXCEPT_STAR_FLAG, &foo_method,
   "2:numpy:SHAPE", &get_shape_method,
   "2:fieldoffset:barfield", (void*)5, 0 /*padding to n=2^k*/ }

I.e. all keys are prepended by the number of slots they use. So methods get to use 3 sizeof(void*) slots since they need the flags, but entries that don't need flags use only 2 slots. (In this case, "numpy:SHAPE" is a protocol defined by NumPy and so doesn't need any flags; or the flags are stored under "numpy:FLAGS" by that protocol.)

Then, PyExtensibleType_Ready parses this and rearranges the table to a perfect hash-table. As part of that, it parses the string literal keys and interns them, so that the number of slots becomes available in a more coder-friendly manner:

typedef {
    uint64_t hash; /* lower-64 bit of md5 */
    uint32_t strlen; /* we allow \0 in key */
    uint8_t nslots; /* set to 3 for first example */
    char *key; /* set to "method:foo:i4i4->i4" */
} fasttable_key_t;

Then, the interned keys for the table is the fasttable_key_t*.

Storing the hash inside the key has two pros:
- Caching the md5 work (provided the interner uses a faster hash function to go from string to

Lookup would happen like this:

typedef {
    fasttable_key_t *key;
    uintptr_t flags;
    void *funcptr
} method_t;

(method_t*)PyCustomSlots_Find(mykey, mykey->hash);
/* or, faster: */
(method_t*)PyCustomSlots_Find(mykey, 0x45343453453fabaULL);

If you want to scan the table linearly (to avoid having to bother with getting an interned key), you would scan a table of void*, and for every entry cast the key to fasttable_key_t* and check nslots for how much to skip to get to the next entry.

Too complicated?

Dag
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Reply via email to