Re: [Cython] Hash-based vtables

Dag Sverre Seljebotn Tue, 12 Jun 2012 12:53:45 -0700

On 06/12/2012 08:12 PM, Robert Bradshaw wrote:

On Tue, Jun 12, 2012 at 10:21 AM, Dag Sverre Seljebotn
<d.s.seljeb...@astro.uio.no>  wrote:

On 06/12/2012 01:01 PM, Dag Sverre Seljebotn wrote:


On 06/10/2012 11:53 AM, Robert Bradshaw wrote:


On Sun, Jun 10, 2012 at 1:43 AM, Dag Sverre Seljebotn


About signatures, a problem I see with following the C typing is that
the
signature "ill" wouldn't hash the same as "iii" on 32-bit Windows and
"iqq"
on 32-bit Linux, and so on. I think that would be really bad.



This is why I suggested promotion for scalars (divide ints into
<=sizeof(long) and sizeof(long)<  x<= sizeof(long long)), checked at
C compile time, though I guess you consider that evil. I don't
consider not matching really bad, just kind of bad.



Right. At least a convention for promotion of scalars would be good
anyway.

Even MSVC supports stdint.h these days; basing ourselves on the random
behaviour of "long" seems a bit outdated to me. "ssize_t" would be
better motivated I feel.

Many linear algebra libraries use 32-bit matrix indices by default, but
can be swapped to 64-bit indices (this holds for many LAPACK
implementations and most sparse linear algebra). So often there will at
least be one typedef that is either 32 bits or 64 bits without the
Cython compiler knowing.

Promoting to a single type "[u]int64" is the only one that removes
possible combinatorial explosion if you have multiple external typedefs
that you don't know the size of (although I guess that's rather
theoretical).

Anyway, runtime table generation is quite fast, see below.

"l" must be banished -- but then one might as well do "i4i8i8".

Designing a signature hash where you select between these at
compile-time is
perhaps doable but does generate a lot of code and makes everything
complicated.



It especially gets messy when you're trying to pre-compute tables.

I think we should just start off with hashing at module load
time when sizes are known, and then work with heuristics and/or build
system
integration to improve on that afterwards.



Finding 10,000 optimal tables at runtime better be really cheap than
for Sage's sake :).



The code is highly unpolished as I write this, but it works so here's
some preliminary benchmarks.

Assuming the 64-bit pre-hashes are already computed, hashing a 64-slot
table varies between 5 and 10 us (microseconds) depending on the set of
hashes.

Computing md5's with C code from ulib (not hashlib/OpenSSL) takes ~400ns
per hash, so 26 us for the 64-slot table =>  it dominates!

The crapwow64 hash takes ~10-20 ns, for ~1 us per 64-slot table.
Admittedly, that's with hand-written non-portable assembly for the
crapwow64.

Assuming 10 000 64-slot tables we're looking at something like 0.3-0.4
seconds for loading Sage using md5, or 0.1 seconds using crapwow64.


https://github.com/dagss/pyextensibletype/blob/master/include/perfecthash.h

http://www.team5150.com/~andrew/noncryptohashzoo/CrapWow64.html



Look: A big advantage of the hash-vtables is that subclasses stay
ABI-compatible with superclasses, and don't need recompilation when
superclasses adds or removes methods.

=>  Finding the hash table must happen at run-time in a lot of cases anyway,
so I feel Robert's chase for a compile-time table building is moot.

I feel this would also need to trigger automatically heap-allocated tables
if the statically allocated. Which is good to have in the very few cases
where a perfect table can't be found too.


Finding the hash table at runtime should be supported, but the *vast*
majority of methods sets is known at compile time. 0.4 seconds is a
huge overhead to just add to Sage (yes, it's an exception, but an
important one), and though crapwow64 helps I'd rather rely on a known,
good standard hash. I need to actually look at Sage to see what the
impact would be. Also, most tables would probably have 2 entries in
them (e.g. a typed one and an all-object one).


Hopefully 0.4 was a severe overestimate once one actually looks at this.

What's loaded at startup -- is it the pyx files in sage/devel/sage? Mycount (just cloned from github.com/sagemath/sage):


$ find -name '*.pyx' -exec grep 'cdef class' {} \; | wc -l
641

And I doubt that *all* of that is loaded at Sage startup, you need to dosome manual importing for at least some of those classes? So it'sprobably closer to 0.01-0.02 seconds than 0.4 even with md5?

About the *vast* majority of method sets being known: That may be thecase for old code, but keep in mind that that situation mightdeteriorate. Once we have hash-based vtables, declaring methods of cdefclasses in pxd files could become optional (and only be there to helpcallers, incl. subclasses, determine the signature). So any methodthat's only used in the superclass and is therefore not declared in thepxd file would consistently trigger a run-time build of the table ofsubclasses; the compile-time generated table would be useless then.

(OTOH, as duck-typing becomes the norm, more cdef classes will bewithout superclasses...)

long int will continue to be an important type as long as it's the
default for int literals and Python's "fast" ints (whether in type or
implementation), so we can't just move to stdint. I also don't like
that the form of the table (and whether certain signatures match)
being platform-dependent: the less variance we have from one platform
to the next is better.

Perhaps in Sage there's a lot of use of "long" and therefore this wouldmake Sage code vary less between platforms.

But for much NumPy-using code you'd typically use int32 or int64, andsince long is 32 bits on 32-bit Windows and 64 bits on Linux/Mac,choosing long sort of maximises inter-platform variation of signatures...

On an orthogonal note, sizeof(long)-sensitive tables need not be
entirely at odds with compile-time table compilation, as most
functions will probably have 0 or 1 parameters that are of unknown
size, so we could spit out 1 or 2 statically compiled tables and do
generate the rest on the fly. I still would rather have fixed
Cython-compile time tables though.


Well, I'd "rather have" that as well if it worked every time.

But there's no use designing a feature which works great unless you usethe fftw_complex type (can be 64 or 128 bits). Or works great unless youuse 64-bit LAPACK. Or works great unless you have a superclass with apartially defined pxd file.

Since one implementation of a concept is simpler than two, then as longas run-time generation code must always be there (or at least, be therein the common cases x, y, and z), the reasons should be very good foradding a compile-time implementation.

Sage taking 0.4 seconds extra would indeed be a very good reason, but Idon't believe it. So when you can get around to it it'd be great to havethe actual number of classes (and ideally an estimate for number ofmethods per class).


Dag
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] Hash-based vtables

Reply via email to