John Arbash Meinel wrote: >> I recommend using a dedicated dict instead, and put your byte strings >> there. This will not change the performance in any way, given that intern() >> on a char* has always been creating a Python byte string before interning >> (and possibly dropping) it. But it will make it clearer in the code what is >> actually happening. > > So I can't intern() a char* because it has NULLs in the array.
You have to use PyString_FromStringAndSize() to build a Python byte string manually, which then supports being interned in Python 2 (and that will be fixed in Cython 0.12). > I don't want to use a dedicated dict, because then the strings become > immortal. Except that a dedicated dict allows you to control if the strings /really/ become immortal or not. Once a string is interned in CPython, there is no way to get it out of the dict of interned strings. Your own dict is under your control. > I do understand that interning in python is really meant for internal > use. Because attributes, etc are all managed via py strings (becoming > Unicode in Py3), and thus lookups in dicts, etc are better if you intern > everything. Interning is not required. Any dict will work just fine, as long as you make sure that the strings you use come from that dict. > However, there is no way to implement de-duping without immortality in > python Unless you have a way of keeping track of the usage of a value. Depending on your use case, it might work to just clear (and maybe rebuild) the whole dict when it reaches a given size or after the 1000000-th insertion, or when memory gets tight, or whatever. > other than something like weakrefs (which strings and tuples > don't support, and really exacerbates the memory problems w/ interning, Plus, weakrefs are pretty slow. I did a little benchmarking in lxml lately to find out if a cached object reference (that I had added for performance reasons, but that introduced a cyclic reference) could be replaced by a weak reference. It turned out that it was actually faster to just recreate the object than to keep a weak reference to a life object. So I just dropped the cached reference and with it all sorts of memory issues that were due to requiring a GC cleanup run. Stefan _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
