Re: [Cython] Optimising dict manipulation in extension types

tav Sun, 09 Nov 2008 00:14:35 -0800

Thanks for the fast responses -- it's given me a lot of confidence in
using Cython! Are there plans to champion it to be included as part of
Python's standard lib and how can I help with that?


  aaron> I'm not sure if this solves the problem but it would help
  aaron> if you declared your variables with an actual type
  aaron> (list, tuple, dict, etc.) instead of just 'object'.

Thanks -- although the effects seem to be relatively minimal. But
every little helps! =)

  robertwb> Short of writing your own sort algorithm there's no
  robertwb> much of a way to speed up sort(). If you're just
  robertwb> sorting strings here, that could probably be done a
  robertwb> lot faster.

Hmz, I am sorting strings. Is there a special subset/subrouting of
timsort that I can use for just sorting strings?

  stefan> If you change the "char* attr" into a plain "attr", this
  stefan> will speed up things considerably. In your code,
  stefan> Cython has to convert "attr" to a Python string on
  stefan> each loop to convert it to the Python string "j" (which
  stefan> is a really bad name for a Python string, BTW).

Ah thanks. I mistakenly thought that typing everything would help
speed things up.

Not really understanding the subtleties of when types are converted
between Python and C types...

  stefan> I also don't quite understand the hasattr(v, "__get__").
  stefan> Is that supposed to access a property?

My intention was to mimic the behaviour of type.__getattribute__ in
order to support Python's descriptor protocols...

  stefan> I extended the list description a bit, so that it becomes
  stefan> clearer that this *is* the right list.

Thanks!

  tav> The sort() and looping (for key in keys) seems to take up
  tav> most of the time... how can I do this better so that it takes
  tav> less time?

  stefan> Why is this important? Do you create them very often,
  stefan> or is the size of env the problem? We need to know
  stefan> what you want to optimise in order to hep you.

Sorry, I should have been more explicit in my original email. Instead
of using Python's class statements to define objects, I use nested
function definitions (and closures) to define objects, e.g.

  class This:
    pass

  def Point(x, y):
    this = This()
    this.x = x
    this.y = y
    def getX():
      return this.x
    def setX(value):
       this.x = value
    return Namespace()

The advantage of this is that lends itself to a secure way of
programming. For example, untrusted code can be given access to getX:

  p = Point(1, 2)
  p.setX(56)
  untrusted_code(p.getX)

And, unlike with class-based objects, the untrusted code will not be
able to call setX().

This is all inspired by
http://www.erights.org/elib/capability/ode/ode-capabilities.html

Now, the idea behind Namespace() is to return an object that mimics
the semantics of a traditional class. So that you can call access
attributes/methods using the traditional .dot syntax.

Unfortunately, this means that there will be millions of Namespace
objects being created. As such I am trying to optimise:

* Initialisation speed
* Memory footprint
* Attribute access speed

Right now, in comparison to normal Python class-style objects, the
Cython-based Namespace-style objects cost factors of 7x, 3x, 2x for
those 3 metrics. Ideally, I'd like to bring those metrics down to just
1.5-2x for all 3 metrics.

Fundamentally, Namespace just needs to be immutable and mimic the behaviour of:

  class Namespace:
    def __init__(self, **env):
      self.__dict__ = env # should also bound methods appropriately
      freeze(self) # make this instance immutable

As an additional memory optimisation I decided to try and borrow a
trick from the PyPy guys:
http://codespeak.net/pypy/dist/pypy/doc/interpreter-optimizations.html#sharing-dicts

Namespace() objects created from the same constructor are likely to
have the same scope env dict. As such, space could be saved by sharing
the keys amongst them and then using an array lookup to find the
specific values. This also being the same reason why I was trying to
sort() the keys -- so as to minimise the number of different
combinations of keys() being stored.

Sorry for going into so much detail -- hope it lends a bit of clarity.

Any further insight and help would be greatly appreciated -- thanks!

-- 
love, tav

plex:espians/tav | [EMAIL PROTECTED] | +44 (0) 7809 569 369
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Optimising dict manipulation in extension types

Reply via email to