Re: [Cython] Optimising dict manipulation in extension types

Stefan Behnel Sun, 09 Nov 2008 02:06:30 -0800

Hi,

tav wrote:
>   robertwb> Short of writing your own sort algorithm there's no
>   robertwb> much of a way to speed up sort(). If you're just
>   robertwb> sorting strings here, that could probably be done a
>   robertwb> lot faster.
> 
> Hmz, I am sorting strings. Is there a special subset/subrouting of
> timsort that I can use for just sorting strings?
> 
>   stefan> If you change the "char* attr" into a plain "attr", this
>   stefan> will speed up things considerably. In your code,
>   stefan> Cython has to convert "attr" to a Python string on
>   stefan> each loop to convert it to the Python string "j" (which


I meant: "to compare it to j". String comparison necessarily happens in Python
space here.


>   stefan> is a really bad name for a Python string, BTW).
> 
> Ah thanks. I mistakenly thought that typing everything would help
> speed things up.
> 
> Not really understanding the subtleties of when types are converted
> between Python and C types...

Whenever it's necessary. For a comparison, for example; on assignments, or
when passing parameters to a function. You can see it in the generated C code,
which is actually designed to be readable by humans.

http://behnel.de/cgi-bin/weblog_basic/index.php?p=17

A good way to use timeit with Cython code is pyximport, BTW. Something like

  python -m timeit -s 'import pyximport; pyximport.install(); \
        from mycythonmodule import myfunction'    'myfunction(somearg)'

will show you how fast your code is without the hassle of recompiling your
.pyx file manually after each change. It just compiles and imports the .pyx
file on the fly.


>   stefan> I also don't quite understand the hasattr(v, "__get__").
>   stefan> Is that supposed to access a property?
> 
> My intention was to mimic the behaviour of type.__getattribute__ in
> order to support Python's descriptor protocols...

Try to restrict your code to handle the cases you really need. More general
code should not be inside the critical path or a fast loop.


> Instead of using Python's class statements to define objects, I use nested 
> function definitions (and closures) to define objects, e.g.
> 
>   class This:
>     pass
> 
>   def Point(x, y):
>     this = This()
>     this.x = x
>     this.y = y
>     def getX():
>       return this.x
>     def setX(value):
>        this.x = value
>     return Namespace()
> 
> The advantage of this is that lends itself to a secure way of
> programming. For example, untrusted code can be given access to getX:
> 
>   p = Point(1, 2)
>   p.setX(56)
>   untrusted_code(p.getX)
> 
> And, unlike with class-based objects, the untrusted code will not be
> able to call setX().

Who needs setters when you can modify "this" directly?

        >>> def test():
        ...     a = []
        ...     def get(i):
        ...             return a[i]
        ...     return get
        >>> get = test()
        >>> get.func_closure[0].cell_contents.append(123)
        >>> get(0)
        123


> Unfortunately, this means that there will be millions of Namespace
> objects being created. As such I am trying to optimise:
> 
> * Initialisation speed
> * Memory footprint
> * Attribute access speed

Use a "cdef class" in Cython instead. It's implemented in C, so you need to
modify the class instance memory at the C level in order to change the object
in other ways than you allow. That's not secure, untrusted code may do that,
but it's not trivial and therefore pretty unlikely at least.

Stefan

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Optimising dict manipulation in extension types

Reply via email to