Re: [Cython] Hash-based vtables

Dag Sverre Seljebotn Fri, 08 Jun 2012 23:02:19 -0700

On 06/09/2012 08:00 AM, Dag Sverre Seljebotn wrote:

On 06/09/2012 07:45 AM, Dag Sverre Seljebotn wrote:

On 06/09/2012 03:21 AM, Robert Bradshaw wrote:

On Fri, Jun 8, 2012 at 2:12 PM, Dag Sverre Seljebotn

There's still the indirection through SEP 200 (extensibletype slots).
We can
get rid of that very easily by just making that table and the
hash-vtable
one and the same. (It could still either have interned string keys
or ID
keys depending on the least significant bit.)


Or we can even forgo the interning for this table, and give up on
partitioning the space numerically and just use the dns-style
prefixing, e.g. "org.cython.X" belongs to us.


Huh? Isn't that when you *need* interning? Do you plan on key-encoding
those kind of strings into 64 bits?

(I think it would usually be "method:foo:ii->d" (or my current
preference is "method:foo:i4i8->f8"))


Well, I guess something like "org.cython.X" would happen often as well,
in addition. Just put it all in the same table :-)


Partitioning the space numerically you'd just hash the number; "SEP 260:
We use id 0x70040001, which has lower-64-md5 0xfa454a...ULL".


The real use-case I see for this now is in having the PyArray_DATA etc.
access pointers simply through compile-time constants the library can
define on both ends. It could just do

PyCustomSlots_Lookup(obj->ob_type, 0x70040001, 0xfa45323...ULL)

specifically to get a function retrieving the data-pointer.
PyArray_SHAPE would do

PyCustomSlots_Lookup(obj->ob_type, 0x70040002, 0xbbad423...ULL)


Argh. I meant 0x70040002 | 1, of course ;-)

DS


Also, I'd want PyExtensibleType_Object to have:

{
...
PyPerfectTable *tp_perfect_table;
Py_ssize_t tp_perfect_table_obj_offset;
}

i.e. we allow for getting quickly to a table on the object in addition
to the one on the type.

Callbacks look up the one on the object first (before potentially
checking for __call__ in the type); method-calling might ignore the one
on the object.

Dag

There is value in the double indirection if this (or any of the other)
lookup tables are meant to be modified over time.


This isn't impossible with a hash table either. You just need to
reallocate a little more often than what would be the case with a
regular hash table, but not dramatically so (you need to rehash whenever
the element to insert hashes into a "large" bin, which are rather few).

I want the table to have a pointer to it, so that you can atomically
swap it out.

To wrap up, I think this has grown in complexity beyond the "simple SEP
spec". It's at the point where you don't really want to have several
libraries implementing the same simple spec, but instead use the same
implementation.

But I think the advantages are simply too good to give up on.

So I think a viable route forward is to forget the
CEP/SEP/pre-PEP-approach
for now (which only works for semi-complicated ideas with simple
implementations) and instead simply work more directly on a library. It
would need to have a couple of different use modes:


I prefer an enhancement proposal with a spec over a library, even if
only a single library gets used in practice. I still think it's simple
enough. Basically, we have the "lookup spec" and then a CEP for
applying this to fast callable (agreeing on signatures, and what to do
with extern types) and extensible type slots.

OK.

- A Python perfect-hasher for use when generating code, with only the a
string interner based on CPython dicts and extensibletype metaclass as
runtime dependencies (for use in Cython). This would only add some
hundred
source file lines...

- A C implementation of the perfect hashing exposed through a
PyPerfectHashTable_Ready(), for use in libraries written in C like
NumPy/SciPy). This would need to bundle the md5 algorithm and a C
implementation of the perfect hashing.

And on the distribution axis:

- Small C header-style implementation of a string interner and the
extensibletype metaclass, rendezvousing through sys.modules

- As part of the rendezvous, one would always try to __import__ the
*real*
run-time library. So if it is available in sys.path it overrides
anything
bundled with other libraries. That would provide a way forward for
GIL-less
string interning, a Python-side library for working with these tables
and
inspecting them, etc.


Hmm, that's an interesting idea. I think we don't actually need
interning, in which case the "full" library is only needed for
introspection.


You don't believe the security concern is real then? Or do you want to
pay the cost for a 160-bit SHA1 compare everywhere?

I'd love to not do interning, but I see no way around it.

BTW, a GIL-less interning library isn't rocket science. I ran khash.h
through a preprocessor with

KHASH_MAP_INIT_STR(str_to_entry, entry_t)

and the result is 180 lines of code for the hash table. Then pythread.h
provides the thread lock, another 50 lines for the interning logic
(intern_literal, intern_heap_allocated, release_interned).

It just seems a little redundant to ship such a thing in every
Cython-generated file since we hold the GIL during module loading anyway.

Dag
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] Hash-based vtables

Reply via email to