Why doesn't Python remember the initial directory?

As far as I've been able to determine, Python does not remember
(immutably, that is) the working directory at the program's start-up,
or, if it does, it does not officially expose this information.

Does anyone know why this is?  Is there a PEP stating the rationale
for it?


Re: Why doesn't Python remember the initial directory?

Why would you expect that it would?  What would it (or you) do with this 

More to the point, doing a chdir() is not something any library code 
would do (at least not that I'm aware of), so if the directory changed, 
it's because some application code did it.  In which case, you could 
have just stored the working directory yourself.

This means that no library code can ever count on, for example,
being able to reliably find the path to the file that contains the
definition of __main__.  That's a weakness, IMO.  One manifestation
of this weakness is that os.chdir breaks inspect.getmodule, at
least on Unix.  If you have some Unix system handy, you can try
the following.  First change the argument to os.chdir below to some
valid directory other than your working directory.  Then, run the
script, making sure that you refer to it using a relative path.
When I do this on my system (OS X + Python 2.7.3), the script bombs
at the last print statement, because the second call to inspect.getmodule
(though not the first one) returns None.

import inspect
import os

frame = inspect.currentframe()

print inspect.getmodule(frame).__name__

os.chdir('/some/other/directory') # where '/some/other/directory' is
  # different from the initial directory

print inspect.getmodule(frame).__name__


% python demo.py
python demo.py
Traceback (most recent call last):
  File demo.py, line 11, in module
print inspect.getmodule(frame).__name__
AttributeError: 'NoneType' object has no attribute '__name__'

I don't know of any way to fix inspect.getmodule that does not
involve, directly or indirectly, keeping a stable record of the
starting directory.

But, who am I kidding?  What needs fixing, right?  That's not a
bug, that's a feature!  Etc.

By now I have learned to expect that 99.99% of Python programmers
will find that there's nothing wrong with behavior like the one
described above, that it is in fact exactly As It Should Be, because,
you see, since Python is the epitome of perfection, it follows
inexorably that any flaw or shortcoming one may *perceive* in Python
is only an *illusion*: the flaw or shortcoming is really in the
benighted programmer, for having stupid ideas about programming
(i.e. any idea that may entail that Python is not *gasp* perfect).
Pardon my cynicism, but the general vibe from the replies I've
gotten to my post (i.e. if Python ain't got it, it means you don't
need it) is entirely in line with these expectations.


How to get initial absolute working dir reliably?

What's the most reliable way for module code to determine the
absolute path of the working directory at the start of execution?

(By module code I mean code that lives in a file that is not
meant to be run as a script, but rather it is meant to be loaded
as the result of some import statement.  In other words, module
code is code that must operate under the assumption that it can
be loaded at any time after the start of execution.)

Functions like os.path.abspath produce wrong results if the working
directory is changed, e.g. through os.chdir, so it is not terribly
reliable for determining the initial working directory.

Basically, I'm looking for a read-only variable (or variables)
initialized by Python at the start of execution, and from which
the initial working directory may be read or computed.


Re: Official reason for omitting inspect.currentcallable() ?

I'm not familiar with it by that name, but Pike's this_function is
what the OP's describing.

You got it.

It's a useful construct in theory when you want to write in recursion,
which was part of the rationale behind PEP 3130 

Thank you!


Official reason for omitting inspect.currentcallable() ?

Is there an *explicitly stated* reason (e.g. in a PEP, or in some
python dev list message) for why the inspect module (at least for
Python 2.7) does not include anything like a currentcallable()
function that would *stably*[1] return the currently executing
callable object?

(It seems unlikely that the absence in the inspect module of anything
even remotely like such a currentcallable is merely an oversight,
considering how many introspection facilities the inspect module
provides.  It seems far more likely that this absence is either
due to some fundamental limitation of Python that makes it impossible
to fully specify such a function, or it is the result of a deliberate
policy against including such a function in inspect.)


[1] By stably above I mean, e.g., that the value returned by the
top-level function (object) defined by

def spam():
return inspect.currentcallable()

is *invariant*, in contrast to the value returned by the top-level
function (object) defined by

def ham():
return ham

which is whatever the current value of the 'ham' global happens to


sick of distribute, setup, and all the rest...

it's an all-out disgrace.

when is python going to get a decent module distribution system???

and don't tell me to do it myself: it's clear that the sorry
situation we have now is precisely that too many programmers without
the requisite expertise or policy-making authority have decided to
pitch in.  This is something for GvR and his top Python core library
team to do, because the problems are as much policy and institutional
ones as they are technical (programming) ones.


Java is killing me! (AKA: Java for Pythonheads?)

*Please* forgive me for asking a Java question in a Python forum.
My only excuse for this no-no is that a Python forum is more likely
than a Java one to have among its readers those who have had to
deal with the same problems I'm wrestling with.

Due to my job, I have to port some Python code to Java, and write
tests for the ported code.  (Yes, I've considered finding myself
another job, but this is not an option in the immediate future.)

What's giving me the hardest time is that the original Python code
uses a lot of functions with optional arguments (as is natural to
do in Python).  

As far as I can tell (admittedly I'm no Java expert, and have not
programmed in it since 2001), to implement a Java method with n
optional arguments, one needs at least 2**n method definitions.
Even if all but one of these definitions are simple wrappers that
call the one that does all the work, it's still a lot of code to
wade through, for nothing.

That's bad enough, but even worse is writing the unit tests for
the resulting mountain of fluffCode.  I find myself writing test
classes whose constructors also require 2**n definitions, one for
each form of the function to be tested...

I ask myself, how does the journeyman Python programmer cope with
such nonsense?

For the sake of concreteness, consider the following run-of-the-mill
Python function of 3 arguments (the first argument, xs, is expected
to be either a float or a sequence of floats; the second and third
arguments, an int and a float, are optional):

   def quant(xs, nlevels=MAXN, xlim=MAXX):
if not hasattr(xs, '__iter__'):
return spam((xs,), n, xlim)[0]

if _bad_quant_args(xs, nlevels, xlim):
raise TypeError(invalid arguments)

retval = []
for x in xs:
# ...
# elaborate acrobatics that set y
# ...

return retval

My Java implementation of it already requires at least 8 method
definitions, with signatures:

short[] quant (float[], int, float) 
short[] quant (float[], int   ) 
short[] quant (float[],  float) 
short[] quant (float[]) 

short   quant (float  , int, float) 
short   quant (float  , int   ) 
short   quant (float  ,  float) 
short   quant (float  ) 

Actually, for additional reasons, too arcane to go into, I also
need four more:

short   quant (Float  , Integer, Float) 
short   quant (Float  , Integer   ) 
short   quant (Float  ,  Float) 
short   quant (Float  ) 

Writing JUnit tests for these methods is literally driving me

Some advice on implementing and testing functions with optional
arguments in Java would be appreciated.



Python app dev tools for Gnome?

There's a zillion utility apps that I've had kicking around in my
head for years, but I've never implemented because I absolutely
hate GUI programming.

But I'm increasingly impressed by the quality, stability, and sheer
number, of Gnome apps that I keep coming across that use Python
under the hood.

This gives me hope that maybe programming GUI Python apps for Gnome
these days is no longer the traumatizing experience it used to be
when I last tried it.

Can someone recommend some good tools to speed up the development
of Python apps[1] for Gnome?  E.g. is there anything like Xcode
for Gnome+Python?



[1] Needless to say, when I write apps I mean full-blown GUI
apps: windows, menus, events, threads, clickable icon, the whole
ball of wax.  As opposed to cli apps, panel widgets, etc.

Re: __delitem__ feature

We know it because it explains the observable facts.

So does Monday-night quarterbacking...

__delitem__ feature

When I execute this file:

def nodelfactory(klass):
class nodel(klass):
def _delitem(self, _):
raise TypeError(can't delete)

# __delitem__ = _delitem

def __init__(self, *a, **k):
klass.__init__(self, *a, **k)
self.__delitem__ = self._delitem

nodel.__name__ = 'nodel%s' % klass.__name__
return nodel

if __name__ == '__main__':
import traceback as tb

d = nodelfactory(dict)([('k1', 'v1'), ('k2', 'v2')])

try: d.__delitem__('k1')
except TypeError: tb.print_exc()
print d

try: del d['k1']
except TypeError: tb.print_exc()
print d

l = nodelfactory(list)([1, 2, 3, 4])

try: l.__delitem__(0)
except TypeError: tb.print_exc()
print l

try: del l[0]
except TypeError: tb.print_exc()
print l

...the output I get is:

Traceback (most recent call last):
  File /tmp/delbug.py, line 20, in module
try: d.__delitem__('k1')
  File /tmp/delbug.py, line 4, in _delitem
raise TypeError(can't delete)
TypeError: can't delete
{'k2': 'v2', 'k1': 'v1'}
{'k2': 'v2'}
Traceback (most recent call last):
  File /tmp/delbug.py, line 30, in module
try: l.__delitem__(0)
  File /tmp/delbug.py, line 4, in _delitem
raise TypeError(can't delete)
TypeError: can't delete
[1, 2, 3, 4]
[2, 3, 4]

It means that, for both subclasses, del fails to trigger the
dynamically installed instance method __delitem__.

If I replace dict with UserDict, *both* deletion attempts lead to
a call to the dynamic __delitem__ method, and are thus blocked.
This is the behavior I expected of dict (and will help me hold on
to my belief that I'm not going insane when inevitably I'm told
that there's no bug in dict or list).

Interestingly enough, if I replace list with UserList, I see no
change in behavior.  So maybe I am going insane after all.


P.S. If you uncomment the commented-out line, and comment out the
last line of the __init__ method (which installs self._delitem as
self.__delitem__) then *all* the deletion attempts invoke the
__delitem__ method, and are therefore blocked.  FWIW.

Re: type(d) != type(d.copy()) when type(d).issubclass(dict)

John O'Hagan resea...@johnohagan.com writes:

IMO one of the benefits of subclassing is that you can just bolt on 
additional behaviour without having to know all the inner workings of the 
superclass, a benefit that is somewhat defeated by this behaviour of builtins.

I agree.  I've read the old post/articles by GvR and other over
how great it will be now that one can subclass Python builtin types
like any other class (GvR even gives explicit examples of this
luscious possibility in his paper on type/class unification).  But
now I'm discovering so many caveats, exceptions, and gotchas about
subclassing builtins that I have to conclude that this much celebrated
new capability is basically useless...  Just like readability
counts, it is also true that conceptual clarity counts, and
treating builtins as classes in Python is the most obfuscated design
I've ever seen.

UserDict, come back, all is forgotten!


Re: type(d) != type(d.copy()) when type(d).issubclass(dict)

kj no.em...@please.post wrote:

 Watch this:
 class neodict(dict): pass
 d = neodict()
class '__main__.neodict'
type 'dict'
 Bug?  Feature?  Genius beyond the grasp of schlubs like me? 


In (almost?) all cases any objects constructed by a subclass of a builtin 
class will be of the original builtin class.

What I *really* would like to know is: how do *you* know this (and
the same question goes for the other responders who see this behavior
of dict as par for the course).  Can you show me where it is in
the documentation?  I'd really appreciate it.  TIA!


Re: __delitem__ feature

On 12/26/2010 10:53 AM, kj wrote:
 P.S. If you uncomment the commented-out line, and comment out the
 last line of the __init__ method (which installs self._delitem as
 self.__delitem__) then *all* the deletion attempts invoke the
 __delitem__ method, and are therefore blocked.  FWIW.

Because subclasses of builtins only check the class __dict__ for special 
method overrides, not the instance __dict__.

How do you know this?  Is this documented?  Or is this a case of
Monday-night quarterbacking?


Re: How to pop the interpreter's stack?

Except that the *caller* never gets the traceback (unless if it deliberately 
inspects the stack for some metaprogramming reason). It gets the exception, 
that is the same no matter what you do. The developer/user gets the traceback, 
and those implementation details *are* often important to them.

Just look at what Python shows you if you pass the wrong number of
arguments to a function:

 def spam(x, y, z): pass
 spam(1, 2)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: spam() takes exactly 3 arguments (2 given)

That's it.  The traceback stops at the point of the error.  Python
doesn't show you all the underlying C-coded machinery that went
into detecting the error and emitting the error message.  *No one*
needs this information at this point.  All I'm saying is that I
want to do the same thing with my argument validation code as Python
does with its argument validation code: keep it out of sight.  When
my argument validation code fires an exception ***there's no bug
in **my** code***.  It's doing exactly what it's supposed to do.
Therefore, there's no need for me to debug anything, and certainly
no need for me to inspect the traceback all the way to the exception.
The bug is in the code that called my function with the wrong
arguments.  The developer of that code has no more use for seeing
the traceback all the way to where my code raises the exception
than I have for seeing the traceback of Python's underlying C code
when I get an error like the one shown above.


Re: How to pop the interpreter's stack?

You failed to mention that cleverness is not a prime requisite of the 
python programmer -- in fact, it's usually frowned upon.

That's the party line, anyway.  I no longer believe it.  I've been
crashing against one bit of cleverness after another in Python's
unification of types and classes...

How can a function find the function that called it?

2010-12-24 Thread kj

I want to implement a frozen and ordered dict.

I thought I'd implement it as a subclass of collections.OrderedDict
that prohibits all modifications to the dictionary after it has
been initialized.

In particular, calling this frozen subclass's update method should,
in general, trigger an exception (object is not mutable).

But OrderedDict's functionality *requires* that its __init__ be
run, and this __init__, in turn, does part of its initialization
by calling the update method.

Therefore, the update method of the new subclass needs to be able
to identify the calling function in order to make a special allowance
for calls coming from OrderedDict.__init__.  (Better yet, it should
be able to allow calls coming from its own class's __init__, via

The best I've been able to do is to use inspect to get the name of
the calling function.  For the case I'm trying to identify, this
name is simply __init__.

But Python code is awash in __init__'s... 

Is it possible to achieve a more precise identification?  Specifically,
I want to know the *class* (not the file) where this '__init__' is

(BTW, I don't understand why inspect doesn't provide something as
basic as the *class* that the method belongs to, whenever applicable.
I imagine there's a good reason for this coyness, but I can't figure
it out.)




Re: How can a function find the function that called it?

One function object can belong to (be in the namespace of) more than
one class, so there is no the class.

There are many other properties that inspect reports on (e.g.
filename) that may not apply to an individual case.  For 99.9% of
methods, the class in which it was lexically defined would be good

type(d) != type(d.copy()) when type(d).issubclass(dict)

2010-12-24 Thread kj

Watch this:

 class neodict(dict): pass
 d = neodict()
class '__main__.neodict'
type 'dict'

Bug?  Feature?  Genius beyond the grasp of schlubs like me? 



How to order base classes?

2010-12-23 Thread kj

Suppose that I want to write a subclass C of base classes A and B.
What considerations should go into choosing the ordering of A and
B in C's base class list?

Since any order one chooses can be overridden on a per-method basis,
by assigning the desired parent's method to the appropriate class
attribute, like this:

class C(B, A)
# override methods spam, ham, and eggs from B
spam = A.spam;
ham = A.ham;
eggs = A.eggs;

...it is difficult for me to see a strong compelling reason for picking
an ordering over another.  But may be just ignorance on my part.

How should one go about deciding the ordering of base classes?



Re: How to order base classes?

The question you ask can only be answered in reference to a specific 
class with specific methods. There is no general principle, it depends 
entirely on the problem being solved.



Re: Partition Recursive

url = 

So I want convert to

myList =

The reserved char are:

specialMeaning = [//,;,/, ?, :, @, = , ,#]

You forgot '.'.

 import re # sorry
 sp = re.compile('(//?|[;?:@=#.])')
 filter(len, sp.split(url))
['http', ':', '//', 'docs', '.', 'python', '.', 'org', '/', 'dev', '/', 
'library', '/', 'stdtypes', '.', 'html', '\
?', 'highlight', '=', 'partition', '#', 'str', '.', 'partition']


issubclass(dict, Mapping)

2010-12-22 Thread kj

In a message (4cf97c94$0$30003$c3e8da3$54964...@news.astraweb.com)
on a different thread, Steven D'Aprano tells me:

I suspect you're trying to make this more complicated than it actually 
is. You keep finding little corner cases that expose implementation 
details (such as the heap-types issue above) and leaping to the erroneous 
conclusion that because you didn't understand this tiny little corner of 
Python's class model, you didn't understand any of it. Python's object 
model is relatively simple, but it does occasionally expose a few messy 

I disagree with your assessment.  What you call little corner
cases I call fundamental, as in you can't really call yourself
competent with Python if you're ignorant about them.

To use a term I first saw in an article by Joel Spolsky
(http://is.gd/je42O), Python's object model is a rather leaky
abstraction.  This refers to the situation in which a user is not
shielded from the implementation details.  When an abstraction
leaks, implementation details are no longer negligible, they cease
to be little corner cases.

Here's another example, fresh from today's crop of wonders:

(v. 2.7.0)
 from collections import Mapping
 issubclass(dict, Mapping)
(type 'object',)
 [issubclass(b, Mapping) for b in dict.__bases__]

So dict is a subclass of Mapping, even though none of the bases of
dict is either Mapping or a subclass of Mapping.  Great.

I suspect this is another abstraction leak (dict is *supposed* to
be a Python class like all others, but in fact it's not *really*.
You see, once upon a time...).

I conclude that, for me to understand Python's (rather leaky) object
model abstraction, I have to understand its underlying implementation.
Unfortunately, as far as I know, there's no other choice but to
study the source code, since there's no other more readable
description of this implementation.

Maybe there are fewer abstraction leaks in 3.0... 


Re: How to pop the interpreter's stack?

Obfuscating the location that an exception gets raised prevents a lot of 

The Python interpreter does a lot of that obfuscation already, and I
find the resulting tracebacks more useful for it.

An error message is only useful to a given audience if that audience
can use the information in the message to modify what they are
doing to avoid the error.  It is of no use (certainly no *immediate*
use) to this audience to see tracebacks that go deep into code that
they don't know anything about and cannot change.

For example, consider this:

def foo(x, **k): pass

def bar(*a, **k):
if len(a)  1: raise TypeError('too many args')

def baz(*a, **k): _pre_baz(*a, **k)

def _pre_baz(*a, **k):
if len(a)  1: raise TypeError('too many args')

if __name__ == '__main__':
from traceback import print_exc
try: foo(1, 2)
except: print_exc()
try: bar(1, 2)
except: print_exc()
try: baz(1, 2)
except: print_exc()

(The code in the if __name__ == '__main__' section is meant to
simulate the general case in which the functions defined in this file
are called by third-party code.)  When you run this code the output is
this (a few blank lines added for clarity):

Traceback (most recent call last):
  File /tmp/ex2.py, line 5, in module
try: foo(1, 2)
TypeError: foo() takes exactly 1 argument (2 given)

Traceback (most recent call last):
  File /tmp/ex2.py, line 7, in module
try: bar(1, 2)
  File /tmp/example.py, line 4, in bar
if len(a)  1: raise TypeError('too many args')
TypeError: too many args

Traceback (most recent call last):
  File /tmp/ex2.py, line 9, in module
try: baz(1, 2)
  File /tmp/example.py, line 6, in baz
def baz(*a, **k): _pre_baz(*a, **k)
  File /tmp/example.py, line 9, in _pre_baz
if len(a)  1: raise TypeError('too many args')
TypeError: too many args

In all cases, the *programming* errors are identical: functions called
with the wrong arguments.  The traceback from foo(1, 2) tells me this
very clearly, and I'm glad that Python is not also giving me the
traceback down to where the underlying C code throws the exception: I
don't need to see all this machinery.

In contrast, the tracebacks from bar(1, 2) and baz(1, 2) obscure the
fundamental problem with useless detail.  From the point of view of
the programmer that is using these functions, it is of no use to know
that the error resulted from some raise TypeError statement
somewhere, let alone that this happened in some obscure, private
function _pre_baz.

Perhaps I should have made it clearer in my original post that the
tracebacks I want to clean up are those from exceptions *explicitly*
raised by my argument-validating helper function, analogous to
_pre_baz above.  I.e. I want that when my spam function is called
(by code written by someone else) with the wrong arguments, the
traceback looks more like this

Traceback (most recent call last):
  File /some/python/code.py, line 123, in module
spam(some, bad, args)
TypeError: the second argument is bad

than like this:

Traceback (most recent call last):
  File /some/python/code.py, line 123, in module
spam(some, bad, args)
  File /my/niftymodule.py, line 456, in niftymodule
_pre_spam(*a, **k)
  File /my/niftymodule.py, line 789, in __pre_spam
raise TypeError('second argument to spam is bad')
TypeError: the second argument is bad

In my opinion, the idea that more is always better in a traceback
is flat out wrong.  As the example above illustrates, the most
useful traceback is the one that stops at the deepest point where
the *intended audience* for the traceback can take action, and goes
no further.  The intended audience for the errors generated by my
argument-checking functions should see no further than the point
where they called a function incorrectly.


Re: How to pop the interpreter's stack?

On Dec 22, 8:52=A0am, kj no.em...@please.post wrote:
 In mailman.65.1292517591.6505.python-l...@python.org Robert Kern rober=
t.k...@gmail.com writes:

 Obfuscating the location that an exception gets raised prevents a lot of

 The Python interpreter does a lot of that obfuscation already, and I
 find the resulting tracebacks more useful for it.

 An error message is only useful to a given audience if that audience
 can use the information in the message to modify what they are
 doing to avoid the error.

 =A0It is of no use (certainly no *immediate*
 use) to this audience to see tracebacks that go deep into code that
 they don't know anything about and cannot change.

So when the audience files a bug report it's not useful for them to
include the whole traceback?

Learn to read, buster.  I wrote *immediate* use.


general problem when subclassing a built-in class

2010-12-22 Thread kj

Suppose that you want to implement a subclass of built-in class, to
meet some specific design requirements.

Where in the Python documentation can one find the information
required to determine the minimal[1] set of methods that one would
need to override to achieve this goal?

In my experience, educated guesswork doesn't get one very far with
this question.

Here's a *toy example*, just to illustrate this last point.

Suppose that one wants to implement a subclass of dict, call it
TSDict, to meet these two design requirements:

1. for each one of its keys, an instance of TSDict should keep a
timestamp (as given by time.time, and accessible via the new method
get_timestamp(key)) of the last time that the key had a value
assigned to it;

2. other than the added capability described in (1), an instance
of TSDict should behave *exactly* like a built-in dictionary.

In particular, we should be able to observe behavior like this:

 d = TSDict((('uno', 1), ('dos', 2)), tres=3, cuatro=4)
 d['cinco'] = 5
{'cuatro': 4, 'dos': 2, 'tres': 3, 'cinco': 5, 'uno': 1}

OK, here's one strategy, right out of OOP 101:

from time import time

class TSDict(dict):
def __setitem__(self, key, value):
# save the value and timestamp for key as a tuple;
# see footnote [2]
dict.__setitem__(self, key, (value, time()))

def __getitem__(self, key):
# extract the value from the value-timestamp pair and return it
return dict.__getitem__(self, key)[0]

def get_timestamp(self, key):
# extract the timestamp from the value-timestamp pair and return it
return dict.__getitem__(self, key)[1]

This implementation *should* work (again, at least according to
OOP 101), but, in fact, it doesn't come *even close*:

 d = TSDict((('uno', 1), ('dos', 2)), tres=3, cuatro=4)
 d['cinco'] = 5
{'cuatro': 4, 'dos': 2, 'tres': 3, 'cinco': (5, 1293059516.942985), 'uno': 1}
Traceback (most recent call last):
  File stdin, line 1, in module
  File /tmp/tsdict.py, line 23, in get_timestamp
return dict.__getitem__(self, key)[1]
TypeError: 'int' object is not subscriptable

From the above you can see that TSDict fails at *both* of the design
requirements listed above: it fails to add a timestamp to all keys in
the dictionary (e.g. 'uno', ..., 'cuatro' didn't get a timestamp), and
get_timestamp bombs; and it also fails to behave in every other
respect exactly like a built-in dict (e.g., repr(d) reveals the
timestamps and how they are kept).

So back to the general problem: to implement a subclass of a built-in
class to meet a given set of design specifications.  Where is the
documentation needed to do this without guesswork?

Given results like the one illustrated above, I can think of only two
approaches (other than scrapping the whole idea of subclassing a
built-in class in the first place):

1) Study the Python C source code for the built-in class in the hopes
   of somehow figuring out what API methods need to be overridden;

2) Through blind trial-and-error, keep trying different implementation
   strategies and/or keep overriding additional built-in class methods
   until the behavior of the resulting subclass approximates
   sufficiently the design specs.

IMO, both of these approaches suck.  Approach (1) would take *me*
forever, since I don't know the first thing about Python's internals,
and even if I did, going behind the documented API like that would
make whatever I implement very likely to break with future releases of
Python.  Approach (2) could also take a very long time (probably much
longer than the implementation would take if no guesswork was
involved), but worse than that, one would have little assurance that
one's experimentation has truly uncovered all the necessary details;
IME, programming-by-guesswork leads to numerous and often nasty bugs.

Is there any other way?



[1] The minimal bit in the question statement is just another way of
specifying a maximal reuse of the built-in's class code.

[2] For this example, I've accessed the parent's methods directly
through dict rather than through super(TSDict, self), just to keep
the code as uncluttered as possible, but the results are the same
if one uses super.


Re: issubclass(dict, Mapping)

On Wed, 22 Dec 2010 14:20:51 +, kj wrote:

 Here's another example, fresh from today's crop of wonders:
 (v. 2.7.0)
 from collections import Mapping
 issubclass(dict, Mapping)
 (type 'object',)
 [issubclass(b, Mapping) for b in dict.__bases__]
 So dict is a subclass of Mapping, even though none of the bases of dict
 is either Mapping or a subclass of Mapping.  Great.

Yes. So what?

That's being deliberately obtuse.  The situation described goes
smack against standard OOP semantics, which would be fine if all
this stuff was documented clearly and reasonably, i.e. in one
(preferably official) place rather than scattered over a bazillion
separate documents, PEP this, module that, GvR musing #42, etc.

Let's just say that I'm looking forward to the end to these surprises.


How to pop the interpreter's stack?

2010-12-14 Thread kj

Consider this code:

def spam(*args, **kwargs):
args, kwargs = __pre_spam(*args, **kwargs)

# args  kwargs are OK: proceed
# ...

def __pre_spam(*args, **kwargs):
# validate args  kwargs;
# return canonicalized versions of args  kwargs;
# on failure, raise some *informative* exception
# ...
return canonicalized_args, canonicalized_kwargs

I write functions like __pre_spam for one reason only: to remove
clutter from a corresponding spam function that has a particularly
complex argument-validation/canonicalization stage.  In effect,
spam outsources to __pre_spam the messy business of checking and
conditioning its arguments.

The one thing I don't like about this strategy is that the tracebacks
of exceptions raised during the execution of __pre_spam include one
unwanted stack level (namely, the one corresponding to __pre_spam

__pre_spam should be completely invisible and unobtrusive, as if
it had been textually inlined into spam prior to the code's
interpretation.  And I want to achieve this without in any way
cluttering spam with try/catches, decorators, and whatnot.  (After
all, the whole point of introducing __pre_spam is to declutter

It occurs to me, in my innocence (since I don't know the first
thing about the Python internals), that one way to achieve this
would be to have __pre_spam trap any exceptions (with a try/catch
around its entire body), and somehow pop its frame from the
interpreter stack before re-raising the exception.  (Or some
clueful/non-oxymoronic version of this.)  How feasible is this?
And, if it is quite unfeasible, is there some other way to achieve
the same overall design goals described above?




Assigning to __class__ attribute

2010-12-03 Thread kj

I have a couple of questions regarding assigning to an instance's
__class__ attribute.

The first is illustrated by the following interaction.  First I
define an empty class:

 class Spam(object): pass

Now I define an instance of Spam and an instance of Spam's superclass:
 x = Spam()
 y = Spam.__mro__[1]() # (btw, is there a less uncouth way to do this???)
 [z.__class__.__name__ for z in x, y]
['Spam', 'object']

Now I define a second empty class:
 class Ham(object): pass

Next, I attempt to assign the value Ham to x.__class__:

 x.__class__ = Ham
 [isinstance(x, z) for z in Spam, Ham]
[False, True]

This was the first surprise for me: assigning to the __class__
attribute not only isn't vetoed, but in fact changes the instances


First question: how kosher is this sort of class transmutation
through assignment to __class__?  I've never seen it done.  Is this
because it considered something to do only as a last resort, or is
it simply because the technique is not needed often, but it is
otherwise perfectly ok?

The second, and much bigger, surprise comes when I attempt to do
the same class-switching with y:

 y.__class__ = Ham
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: __class__ assignment: only for heap types

(If you recall, y's class is object, the superclass of x.) Apparently
Spam is a heap type (whatever that is) but its superclass, object,
isn't.  This definitely rattles my notions of inheritance: since
the definition of Spam was empty, I didn't expect it to have any
significant properties that are not already present in its superclass.
What's going on here? Is this a bug, or a feature? I can see no
logical justification for allowing such class switching for only
some class and not others.

One last question:  as the questions above make clear, I have a
colossal muddle inside my head regarding Python's model of classes
and inheritance.  This is not for lack of trying to understand it,
but, rather, for exactly the opposite reason: in my zeal to gain
the fullest understanding of this topic, I think I have read too
much that is incorrect, or obsolete, or incomplete...

What is the most complete, definitive, excruciatingly detailed
exposition of Python's class and inheritance model?  I'm expressly
avoiding Google to answer this question, and instead asking it
here, because I suspect that there's some connection between my
state of utter confusion on this topic and the ease with which the
complete/definitive/highest-quality information can get lost among
a huge number of Google hits to popular-but-only-partially-correct
sources of information.  (In fact, I *may* have already read the
source I'm seeking, but subsequent readings of incorrect stuff may
have overwritten the correct information in my brain.)




Comparing floats

2010-11-27 Thread kj

I'm defining a class (Spam) of objects that are characterized by
three parameters, A, B, C, where A and C are n-tuples of floats
and B is an n*n tuple-of-tuples of floats.  I'm defining a limited
multiplication for these objects, like this:

Spam(A, B, C) * Spam(D, E, F) = Spam(A, dot(B, E), F)
  if and only if C == D.

(Here dot(B, E) represents the matrix multiplication of B and E).

In other words, this multiplication is defined only for the case
where the last parameter of the first object is equal to first
parameter of the second object.  An exception should be thrown if
one attempts to multiply two objects that fail to meet this

Therefore, to implement this multiplication operation I need to
have a way to verify that the float tuples C and D are equal.
Certainly, I want to do this in in a way that takes into account
the fact machine computations with floats can produce small
differences between numbers that are notionally the same.  E.g.
(in Python 2.6.1 at least):

 49.0 * (1.0/49.0)
 1.0 == 49.0 * (1.0/49.0)

The only approach I know of is to pick some arbitrary tolerance
epsilon (e.g. 1e-6) and declare floats x and y equal iff the absolute
value of x - y is less than epsilon.

Does the Python standard library provide any better methods for
performing such comparisons?

I understand that, in Python 2.7 and 3.x = 3.1, when the interactive
shell displays a float it shows the shortest decimal fraction that
rounds correctly back to the true binary value.  Is there a way
to access this rounding functionality from code that must be able
to run under version 2.6? (The idea would be to implement float
comparison as a comparison of the rounded versions of floats.)

Absent these possibilities, does Python provide any standard value
of epsilon for this purpose?



Re: Why flat is better than nested?

On 10/26/2010 2:44 PM, kj wrote:
 In mailman.258.1288104186.2218.python-l...@python.org Steve Holden 
 st...@holdenweb.com writes:
 The answer is probably the same as you will see if you try
  from __future__ import braces
 That feature *is* available in Python 2.6 ;-)
 Now, that's hilarious.
See, there *is* a place for humor :)

I have nothing against humor.  The reason why I find import braces
funny is that it is so obviously a joke.  But I do find it mildly
annoying (and just mildly) that a joke/hoax/farce like ZoP/this.py
is built into the standard lib, because a lot of people (not just
me) don't realize it's a joke.  (In fact, the reason I learned
about ZoP/this.py was that in a reply to some post of mine in some
Python forum [maybe c.l.py], the responder simply told me to run
import this, with the implication that it would answer whatever
it was that I was asking about.  Either this person took ZoP
seriously, or was just having fun at a noob's expense.  Either way,
I don't like it.)  Learning a new programming language (which
entails becoming familiar not only with some new syntax, but new
libraries, new general ideas, new ways of doing stuff), is already
disorienting enough as it is.  I don't see the point of making the
task any harder than it already is by injecting additional *gratuitous
confusion* in the form pseudo-rogramming advice that apparently no
experienced Python program really believes/takes seriously anyway.
I just don't understand the need of having this.py in the std lib
of all places.  It's not like there's any risk of losing the ZoP
if it were removed from it.  Zillions of copies of it would be
floating around in the web.  But it would not be confused as
something that is somehow endorsed by those who put together the

I know.  Not a chance. :)



Re: Why flat is better than nested?

On 10/25/2010 3:11 PM, kj wrote:

 Well, it's pretty *enshrined*, wouldn't you say?


   After all, it is part of the standard distribution,

So is 'import antigravity'

Are you playing with my feelings?

% python
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
 import antigravity
Traceback (most recent call last):
  File stdin, line 1, in module
ImportError: No module named antigravity

Too bad, I was looking forward to that.

  has an easy-to-remember invocation,
 etc.  *Someone* must have taken it seriously enough to go through
 all this bother.  If it is as trivial as you suggest (and for all
 I know you're absolutely right), then let's knock it off its pedestal
 once and for all, and remove it from the standard distribution.

If you are being serious, you are being too serious (as in humorless).

Guilty as charged, both in the too serious and the humorless
counts. :/  Blame it on the Asperger's.

My only defense is that, while learning Python over the past year,
I've had *many* you've got to be joking moments while reading
what's ostensible serious Python documents (e.g. PEP 8, PEP 257)
as well as assorted threads featuring GvR and others involved in
the design of Python, to the point that sometimes I do have a hard
time gauging the seriousness of what's considered good programming
/ best practice in the Python world.

Plus, I think it's fair to say that the Python community as a whole
(or at least its more vocal members) are more concerned with
correctness (for lack of a better term) and code aesthetics
than, say, the Perl community.  E.g., only in Python-related threads
I've seen the adjective icky used routinely to indicate that some
code is unacceptable on (more or less) aesthetic grounds.

My point is that, even if one detects some levity in ZoP, given
everything else one runs into in the Python world, for the uninitiated
like me it is still hard to distinguish between what's in jest and
what's in earnest.

Perhaps the disconnect here is that you're seeing the whole thing
from an insider's point of view, while I'm still enough of an
outsider not to share this point of view.  (I happen to think that
one the hallmarks of being an initiate to a discipline is an almost
complete loss of any memory of what that discipline looked like
when the person was a complete novice.  If this is true, then it's
easy to understand the difference in our perceptions.)

Anyway, thanks for letting me in on the joke.  I'll pass it on.

(Though, humorless as it is of me, I still would prefer the ZoP
out of the standard library, to save myself having to tell those
who are even newer to Python than me not to take it seriously.)


Re: Why flat is better than nested?

The answer is probably the same as you will see if you try

  from __future__ import braces

That feature *is* available in Python 2.6 ;-)

Now, that's hilarious.


Why flat is better than nested?

2010-10-25 Thread kj

In The Zen of Python, one of the maxims is flat is better than
nested?  Why?  Can anyone give me a concrete example that illustrates
this point?



PS: My question should not be construed as a defense for nested.
I have no particular preference for either flat or nested; it all
depends on the situation; I would have asked the same question if
the maxim had been nested is better than flat.

Re: Why flat is better than nested?

On Oct 25, 5:07=A0am, kj no.em...@please.post wrote:
 In The Zen of Python, one of the maxims is flat is better than
 nested? =A0Why? =A0Can anyone give me a concrete example that illustrate=
 this point?

Simple. This commandment (endowed by the anointed one, GvR) is
directed directly at lisp and those filthy lispers. If you don't know
what lisp is then Google it. Then try to program with it for one hour.
Very soon after your head will explode from the nested bracket plague
and then you shall be enlightened!

Some of the earliest programming I ever did was in Lisp.  Scheme
actually.  In good ol' 6.001, back in '82.  I loved it.  I have no
problem whatsoever with it.

Benightedly yours,


Re: Why flat is better than nested?

On 10/25/2010 10:47 AM, rantingrick wrote:
 On Oct 25, 5:07 am, kj no.em...@please.post wrote:
 In The Zen of Python, one of the maxims is flat is better than
 nested?  Why?  Can anyone give me a concrete example that illustrates
 this point?
 Simple. This commandment (endowed by the anointed one, GvR) is
 directed directly at lisp and those filthy lispers. If you don't know
 what lisp is then Google it. Then try to program with it for one hour.
 Very soon after your head will explode from the nested bracket plague
 and then you shall be enlightened!
And everyone taking the Zen too seriously should remember that it was
written by Tim Peters one night during the commercial breaks between
rounds of wrestling on television. So while it can give useful guidance,
it's nether prescriptive nor a bible ...

Well, it's pretty *enshrined*, wouldn't you say?  After all, it is
part of the standard distribution, has an easy-to-remember invocation,
etc.  *Someone* must have taken it seriously enough to go through
all this bother.  If it is as trivial as you suggest (and for all
I know you're absolutely right), then let's knock it off its pedestal
once and for all, and remove it from the standard distribution.


How to optimize and monitor garbage collection?

2010-10-24 Thread kj

I'm designing a system that will be very memory hungry unless it
is garbage-collected very aggressively.

In the past I have had disappointing results with the gc module:
I noticed practically no difference in memory usage with and without
it.  It is possible, however, that I was not measuring memory
consumption adequately.

What's the most accurate way to monitor memory consumption in a
Python program, and thereby ensure that gc is working properly?

Also, are there programming techniques that will result in better 
garbage collection?  For example, would it help to explicitly call
del on objects that should be gc'd?




2010-10-23 Thread kj

Is there anything that does for Mathematica what matplotlib does

matplotlib, even in its underlying so-called OO mode, follows
MATLAB's graphics model, which, in my very subjective opinion, is
vastly inferior to Mathematica's.

The latter allows for a clean separation between the textual
specification of a graphic object (which can be very complex), and
its graphic representation.  Furthermore, it is general enough to
allow for the composition of graphic objects within other graphic
objects, to arbitrary depth levels.  This readily allows for the
representation of complex composite figures that are common in
scientific publishing today, where figures not only routinely
consist of several subfigures, but the subfigures themselves contain
mutliple sub-subfigures, and so on.  (In contrast, matplotlib supports
at most two levels of composition [a two-dimensional array of
sub-plots], which is both too inflexible and too limited.)

More generally, despite its usefulness, I find MATLAB in the end
to be one big ugly hack, so, as a developer, I would prefer to stay
clear of anything that is modeled after MATLAB, however loosely.

Any pointers to something more Mathematica-like in Python would be



Re: python shell silently ignores termios.tcsetattr()

In message i9n4ph$d7...@reader1.panix.com, kj wrote:

 I tried to fix the problem by applying the equivalent of stty
 -echo within a python interactive session, but discovered that
 this setting is immediately (and silently) overwritten.

That seems reasonable behaviour; the command loop is resetting the terminal 
to a reasonable state to allow it to read the next command. But the command 
you execute can do anything it likes in-between. What’s wrong with that?

What's wrong with it is that what python thinks is a reasonable
state is actually wrong in this case (it differs from the default
setting established by the Emacs shell).  I had hoped that there
was a way to tell python what to regard as a reasonable state.
I gather that there is no way to do this?


Re: annoying CL echo in interactive python / ipython

On Tue, Oct 19, 2010 at 2:35 PM, kj no.em...@please.post wrote:
 In mailman.24.1287510296.2218.python-l...@python.org Jed Smith j...@jed=
smith.org writes:

On Tue, Oct 19, 2010 at 1:37 PM, kj no.em...@please.post wrote:

 % stty -echo

That doesn't do what you think it does.

 Gee, thanks. =A0That really helped. =A0I'll go talk to my guru now,
 and meditate over this.

You're right, I could have been more clear. I was nudging you to go
read the man page of stty(1), but since you won't and want to get
snarky instead, I will for you:

 echo (-echo)
 Echo back (do not echo back) every character typed.

I read that, and it did not add anything new to what I already knew
about stty -echo.

I'm going to guess that the percent sign in your prompt indicates that
you're using zsh(1).  With my minimally-customized zsh, the echo
option is reset every time the prompt is displayed. That means you can
type stty -echo, push CR, the echo option is cleared, then zsh
immediately sets it before you get to type again.

Wrong guess.  After I run stty -echo, the echoing stays disabled:

% stty -echo
% date
Wed Oct 20 10:01:46 EDT 2010
% date
Wed Oct 20 10:01:47 EDT 2010
% date
Wed Oct 20 10:01:48 EDT 2010
% date
Wed Oct 20 10:01:49 EDT 2010

As to the guess about readline, I only observe this problem with
python (interactive) and ipython, but not with, say, the Perl
debugger, which uses readline as well.  FWIW.


python shell silently ignores termios.tcsetattr()

2010-10-20 Thread kj

This post is a continuation of an earlier thread called

annoying CL echo in interactive python / ipython

I found some more clues to the problem, although no solution yet.
First, I found a post from 2009.05.09 that describes exactly the
same situation I've observed (although it got no responses):


I tried to fix the problem by applying the equivalent of stty
-echo within a python interactive session, but discovered that
this setting is immediately (and silently) overwritten.  The
following interaction illustrates what I mean.  (I've added my
comments, preceded by ###.)

First I start the barest possible instance of Emacs (only the bare
mininum of environment settings, and loading no configurations from
an ~/.emacs file):

% env -i HOME=$HOME DISPLAY=$DISPLAY TERM=$TERM /opt/local/bin/emacs -Q

Within Emacs, the first command I execute is M-x shell, to bring
up an Emacs shell.  What follows is the interaction within this shell:

sh-3.2$ stty -a  ### first I get the initial settings for the terminal
speed 9600 baud; 0 rows; 0 columns;
lflags: icanon isig iexten -echo echoe -echok echoke -echonl echoctl
-echoprt -altwerase -noflsh -tostop -flusho -pendin -nokerninfo
iflags: -istrip icrnl -inlcr -igncr ixon -ixoff ixany imaxbel -iutf8
-ignbrk brkint -inpck -ignpar -parmrk
oflags: opost -onlcr -oxtabs -onocr -onlret
cflags: cread cs8 -parenb -parodd hupcl -clocal -cstopb -crtscts -dsrflow
-dtrflow -mdmbuf
cchars: discard = ^O; dsusp = ^Y; eof = ^D; eol = undef;
eol2 = undef; erase = undef; intr = ^C; kill = undef;
lnext = ^V; min = 1; quit = ^\; reprint = ^R; start = ^Q;
status = ^T; stop = ^S; susp = ^Z; time = 0; werase = ^W;
### note the echo setting under lflags and the onlcr setting under
### oflags; also note that the stty -a command above was not echoed
sh-3.2$ python  ### next I start an interactive python session
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type help, copyright, credits or license for more information.
 import termios
import termios   ### note the echoing 
 old = termios.tcgetattr(1)
old = termios.tcgetattr(1)
 new = termios.tcgetattr(1)
new = termios.tcgetattr(1)
 new[3] = new[3]  ~termios.ECHO  ~termios.ONLCR
new[3] = new[3]  ~termios.ECHO
 old[3], new[3]
old[3], new[3]
(536872395, 536872385)
 (termios.tcsetattr(1, termios.TCSANOW, new), termios.tcgetattr(1)[3], 
 old[3], new[3])
(termios.tcsetattr(1, termios.TCSANOW, new), termios.tcgetattr(1)[3], old[3], 
(None, 536872385, 536872395, 536872385)
### The output above shows that the setting attempted through
### tcsetattr took momentarily...
 termios.tcgetattr(1) == old
termios.tcgetattr(1) == old
### ...but by this point it has already been reset back to its original value. 
 termios.tcgetattr(1)[3], old[3], new[3]
termios.tcgetattr(1)[3], old[3], new[3]
(536872395, 536872395, 536872385)
sh-3.2$ stty -a  ### after quitting python, the echo and onlcr settings have 
been reversed 
stty -a
speed 9600 baud; 0 rows; 0 columns;
lflags: icanon isig iexten echo echoe -echok echoke -echonl echoctl
-echoprt -altwerase -noflsh -tostop -flusho -pendin -nokerninfo
iflags: -istrip icrnl -inlcr -igncr ixon -ixoff ixany imaxbel -iutf8
-ignbrk brkint -inpck -ignpar -parmrk
oflags: opost onlcr -oxtabs -onocr -onlret
cflags: cread cs8 -parenb -parodd hupcl -clocal -cstopb -crtscts -dsrflow
-dtrflow -mdmbuf
cchars: discard = ^O; dsusp = ^Y; eof = ^D; eol = undef;
eol2 = undef; erase = ^?; intr = ^C; kill = ^U; lnext = ^V;
min = 1; quit = ^\; reprint = ^R; start = ^Q; status = ^T;
stop = ^S; susp = ^Z; time = 0; werase = ^W;

Does anyone understand what's going on?  Is there any way to prevent
python from resetting the settings made with tcgetattr?  (I realize
that this is relatively low-level stuff, so it is unlikely that
there's a clean solution; I'm hoping, however, that there may be
a way to fool python into doing the right thing; after all, this
strange behavior only happens under the Emacs shell; I don't observe
it under, e.g., Terminal or xterm.)



annoying CL echo in interactive python / ipython

2010-10-19 Thread kj

Under some parent shells, both my interactive python as well as
ipython, produce an unwanted echoing of the input line.  E.g.

 1 + 1
1 + 1

What's worse, upon exiting the interactive python/ipython session,
the terminal is left in echo mode:

% date 
Tue Oct 19 13:27:47 EDT 2010
% stty -echo
% date
Tue Oct 19 13:27:50 EDT 2010

It's basically the same story for ipython.

(If I run stty -echo before running either python or ipython, I
still get the echo when I'm in them.  So the problem is not a
pre-existing terminal setting.)

(As I said, this happens only under some shells (e.g. emacs shell),
so YMMV.)

Does anyone know how can I suppress this annoying feature?



Re: annoying CL echo in interactive python / ipython

On Tue, Oct 19, 2010 at 1:37 PM, kj no.em...@please.post wrote:

 % stty -echo

That doesn't do what you think it does.

Gee, thanks.  That really helped.  I'll go talk to my guru now,
and meditate over this.

docstring that use globals?

2010-10-16 Thread kj

Consider the following Python snippet: 


def spam(ham, eggs):
Return a hash of ham and eggs.

The variables ham and eggs are tuples of strings.  The returned
hash is a dict made from the pairs returned by zip(ham, eggs).
If ham contains repeated keys, the corresponding values in eggs
are concatenated using the string SEPARATOR.

For example, spam(('a', 'b', 'a'), ('x', 'y', 'z')) returns
{'a': 'xSEPARATORz', 'b': 'y'}.

# implementation follows...

Of course, as written, the docstring above is no good.  All
occurrences of the string SEPARATOR in it should be replaced with
the actual value of the global variable SEPARATOR, which in this
case is the string '+'.

My first attempt at achieving this effect was the following:


def spam(ham, eggs):
Return a hash of ham and eggs.

The variables ham and eggs are tuples of strings.  The returned
hash is a dict made from the pairs returned by zip(ham, eggs).
If ham contains repeated keys, the corresponding values in eggs
are concatenated using the string %(SEPARATOR)s.

For example, spam(('a', 'b', 'a'), ('x', 'y', 'z')) returns
{'a': 'x%(SEPARATOR)sz', 'b': 'y'}. % globals() 

# implementation follows...

...which, of course (in retrospect), does not work, since only a
string literal, and not any ol' string-valued expression, at the
beginning of a function's body can serve as a docstring.

What's the best way to achieve what I'm trying to do?

(Of course, all these acrobatics are hardly worth the effort for
the simple example I give above.  In the actual situation I'm
dealing with, however, there are a lot more docstrings and global
variables in the mix, and at this early stage of the development,
the values of the globals are still fluid.  I'm trying to minimize
the effort required to keep the docstrings current, while still
retaining the freedom to adjust the values of the globals.  Also,
FWIW, most of these globals are merely default values that can be
overridden at runtime.)



Re: docstring that use globals?

MRAB, Peter: thanks for the decorator idea!


Re: docstring that use globals?

MRAB, Peter: thanks for the decorator idea!

As an afterthought, is there any way to extend this general idea
to other docstrings beyond function docstrings?

I imagine that the decorator idea works well for method docstrings
too (though I have not tried any of this yet).

But, IIRC, decorators don't work for classes, so class docstrings
would need to be expanded expressly.

The hardest case is module docstrings.  In particular, the
docstring for a script.  Is there any way to apply the globals
expansion idea to a script's toplevel docstring?

(I imagine the answer to this last question is no, but I thought
I'd ask. :) )



dynamic loading error (Symbol not found)

2010-10-16 Thread kj

The following interaction (in OS X) summarizes the situation:

% grep -r _engClose $DYLD_LIBRARY_PATH
Binary file /Applications/MATLAB_R2010a.app/bin/maci64/libeng.dylib matches
% python
Python 2.6.5 (r265:79063, May 22 2010, 18:34:46) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type help, copyright, credits or license for more information.
 from mlabwrap import mlab
Traceback (most recent call last):
  File stdin, line 1, in module
  File mlabwrap.py, line 188, in module
import mlabraw
 2): Symbol not found: _engClose
  Referenced from: 
  Expected in: dynamic lookup


In summary,

1) dlopen produces a Symbol not found: _engClose error;
2) according to grep, a file accessible through the variable
DYLD_LIBRARY_PATH matches the string _engClose;

The permissions of the file in question are all OK (0555); likewise,
the permissions of all the prefix subpaths leading to this file
are fine.

(For all I know, it is possible that, even though the libeng.dylib
file matches _engClose, this is only a fragment of a longer symbol

Can anyone suggest a way to fix this error?



Re: dynamic loading error (Symbol not found)

On 10/16/2010 2:15 PM kj said...

 The following interaction (in OS X) summarizes the situation:

 % grep -r _engClose $DYLD_LIBRARY_PATH
 Binary file /Applications/MATLAB_R2010a.app/bin/maci64/libeng.dylib matches
 % python
 Python 2.6.5 (r265:79063, May 22 2010, 18:34:46)
 [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
 Type help, copyright, credits or license for more information.
 from mlabwrap import mlab

I'd start by checking the version numbers and compatibility 
specifications on mlabwrap which, per their site, claims mlabwrap 
should work with python=2.4.

I don't get your point.  As the fragment you quoted shows, I'm
using Python 2.6.5.


Re: docstring that use globals?

kj wrote:

 The hardest case is module docstrings.

Actually, that one's quite easy, just assign to __doc__.

__doc__ = This is a %s docstring % made-up

D'oh!  Thanks.


Re: hashkey/digest for a complex object

Steven D'Aprano st...@remove-this-cybersource.com.au writes:

 On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote:

 1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold
 for any hashable x (this is a simple consequence of the fact that
 hash(x) == x for any int x (by 'int' I mean 2.X int)).

 It's a beautiful theory, but, alas, it is not the case.

 hash(-1) == -1
 hash(2**64) == 2**64

 to give only two of an infinite number of counter-examples.


And, in fact,
(-1) is the only int such that hash(x) != x.

Arnaud, how did you determine that -1 is the only such int?  I
can't imagine any other approach other than a brute-force check of
all ints...  When I tried this I ran into unforeseen limitations
in xrange, etc.  It's all very doable, but still, it would take at
least about 3 hours on my laptop.

In can only guess that (-1) is a value that has a special meaning when
hashing.  Try this (Python 2.6):

 class A(object):
... def __hash__(self): return -1
 a = A()

Very cool.

BTW, thank you for the explanation in your previous post.  It makes
a lot of sense.  I find it interesting (as Hrvoje pointed out) that
the hash function is (or appears to be) idempotent on integers
(long or not), even though it is not the identity on the integers.
Thanks to Steven for the counterexamples to show the latter.  I've
learned tons from this exchange.


Re: hashkey/digest for a complex object

Reading the source code is also a good approach.

Every time I have attempted to answer a question by looking at the
Python C source, all I've achieved was wasting time, sometimes a
*lot* of time, so by now I've developed what can only be described
as a phobia to it.  I probably need professional help at this point.


Re: hashkey/digest for a complex object

2010-10-09 Thread kj
deep_methods = {   
list: lambda f, l: tuple(map(f, l)),
dict: lambda f, d: frozenset((k, f(v)) for k, v in d.items()),
set: lambda f, s: frozenset(map(f, s)),
# Add more if needed

def apply_method(f, obj):
method = deep_methods[type(obj)]
except KeyError:
return obj
return method(f, obj)

def deepfreeze(obj):
Return a 'hashable version' of an object
return apply_method(deepfreeze, obj)

def deephash(obj):
Return hash(deepfreeze(obj)) without deepfreezing
return hash(apply_method(deephash, obj))

# Example of deepfreezable object:
obj = [1, foo, {(2, 4): {7, 5, 4}, bar: baz}]
   ^   ^
   |   |
   `---`--- what's this?

(1, 'foo', frozenset({('bar', 'baz'), ((2, 4), frozenset({4, 5, 7}))}))

After fixing the missing  in deepfreeze this code works as
advertised, but I'm mystified by the identity between hash(deepfreeze(...))
and deephash(...).  Without some knowledge of the Python internals,
I don't see how this follows.

More specifically, it is not obvious to me that, for example,


would be identical to


but this identity has held every time I've checked it.  Similarly
for other more complicated variations on this theme.

Anyway, thanks for the code.  It's very useful.


Re: frozendict (v0.1)

On Fri, 08 Oct 2010 00:23:30 +, kj wrote:

Because it's always better to use a well-written, fast, efficient, 
correct, well-tested wheel than to invent your own slow, incorrect 
wheel :)

IOW, don't you worry your little head about why.


Re: frozendict (v0.1)

E.g., try with {1:'a', 1j:'b'}

I see.  Thanks for this clarification.  I learned a lot from it.

I guess that frozenset must have some way of canonicalizing the
order of its elements that is dependent on their Python values but
not on their comparability.  My first thought was that they are
ordered according to their hash values, but this theory doesn't

 abc = ('a', 'b', 'c')
 sorted(map(hash, abc))
[-468864544, -340864157, -212863774]
 map(hash, frozenset(abc))
[-468864544, -212863774, -340864157]

I.e. the ordering of the elements in the frozenset does not correspond
to the ordering of their hashes in either direction.  Hmmm. 

I tried to understand this by looking at the C source but I gave
up after 10 fruitless minutes.  (This has been invariably the
outcome of all my attempts at finding my way through the Python C

I guess the take-home message is that frozenset is a more general
way to canonicalize an iterable object than sorting, even though
the reasons for this still remain a mystery to me...  Then again,
just looking at the voodoo that goes into algorithms for computing
hashes fills me with despair.  As much as I dislike it, sooner or
later I'll have to go on faith.


Re: frozendict (v0.1)

At any rate, using your [i.e. Arnaud's] suggestions in this and
your other post, the current implementation of frozendict stands

class frozendict(dict):
for method in ('__delitem__ __setitem__ clear pop popitem setdefault '
def %s(self, *a, **k):
cn = self.__class__.__name__
raise TypeError('%%s' object is not mutable %% cn)
 % method

def __hash__(self):
return hash(frozenset(self.items()))

...which is a lot nicer!

As a side comment on my own post, this is the second time in less
than a week that I find myself having to resort to exec'ing some
code generated from a template.  This one is worse, because there's
nothing runtime-dependent about the code being exec'd.  It sticks
in my craw somehow.  It just doesn't seem quite right that at the
time of doing something as bread-and-butter as implementing a
subclass, one has to choose between explicitly writing a whole
bunch of identical methods, and exec-based hacks like what I'm
doing above.  There's got to be a better to tell Python: override
all these superclass methods with this one.

Or maybe the problem here is that, in a perfect world, frozendict
would be in the standard library, as a superclass of dict lacking
all of dict's destructive methods.  :)



Re: frozendict (v0.1)

2010-10-08 Thread kj
On 10/08/2010 02:23 AM, kj wrote:

Here's my implementation suggestion:

class frozendict(dict):
 def _immutable_error(self, *args, **kwargs):
 raise TypeError(%r object is immutable % self.__class__.__name__)

 __setitem__ = __delitem__ = clear = pop \
 = popitem = setdefault = update = _immutable_error

 def __hash__(self):
 return hash(frozenset(self.iteritems()))

Only 9 lines :-)

Thanks, you just answered the question I just posted, while I was
still writing it!



Re: frozendict (v0.1)

jo...@lophus.org writes:

Hope this helps :-)

It did!  Thanks!  For one thing now I see that I was barking up
the wrong tree in focusing on a canonical order, when, as the code
you posted shows, it is actually not required for hashing.

In fact, I'd come to the conclusion that frozensets had a consistent
order (i.e. frozensets that were equal according to '==' would be
iterated over in the same order), but now I'm not sure that this
is the case.  (Granted, semantically, there's nothing in the
definition of a frozenset that would imply a consistent iteration

Thanks again!


Re: hashkey/digest for a complex object

2010-10-07 Thread kj
If these two attributes, and hence the dicts, are public, then your 
instances are mutable.

I guess I should have written immutable among consenting adults.

As far as I know, Python does not support private attributes, so
I guess the dicts are public no matter what I do.  I suppose that
I can implement frozendict, but I can't think of any way to
enforce the immutability of these frozendicts that would not be
trivial to circumvent (it better be, in fact, otherwise I wouldn't
be able to initialize the damn things).

If you had something else in mind, please let me know.


Re: hashkey/digest for a complex object

2010-10-07 Thread kj
 The short version of this question is: where can I find the algorithm
 used by the tuple class's __hash__ method?

Surprisingly, in the source:


Thanks to you, and to all who pointed me to the place in the source
where this is.

How exactly did you search for this?  Taking a hint from the url
above, I went to Google Code Search and searched for python tuple
hash (minus the quotes, of course), which produced a ton of
irrelevant stuff (almost 80K hits).  Searching for python tuple
hash lang:c cut down the number of hits to ~8K, but still too much
to wade through.  Clearly I need a smarter search strategy (one
that does not include foreknowledge of the name of the actual
function in the C source, of course).


Re: hashkey/digest for a complex object

2010-10-07 Thread kj
results, I did the most straight forward thing - I downloaded and
un-packed the python source... and took a look. 

From that I learned the tuplehash function name.

You must be at least somewhat familiar with the Python source.
Everytime I've peeked into it I just feel lost, but it's clearly
something I need to master sooner or later...  I can't wait for
the next one of those occasional introductions to the Python
internals at our local Python club.



how to test for atomicity/mutability/hashability?

2010-10-07 Thread kj

I want to implement a test t() that will return True if its two
arguments are completely different.  By this I mean that they
don't share any non-atomic component.  E.g., if

a = [0, 1]
b = [0, 1]
c = [2, 3]
d = [2, 3]

A = (a, c, 0)
B = (a, d, 1)
C = (b, d, 0)

The desired test t() would yield:

t(A, B) - False (A and B share the mutable component a)
t(A, C) - True (a =!= c, b =!= d, and 0 is not mutable)
t(B, C) - False (B and C share the mutable component d)

(=!= is shorthand with is not identical to.)

It would facilitate the implementation of t() to have a simple test
for mutability.  Is there one?



frozendict (v0.1)

2010-10-07 Thread kj

Following a suggestion from MRAB, I attempted to implement a
frozendict class.  My implementation took a lot more work than
something this simple should take, and it still sucks.  So I'm
hoping someone can show me a better way.  Specifically, I'm hoping
that there is a recipe for building off standard classes that
cover all the bases with a minimum of tedious repetitive work.

Here's my little monster:

ass frozendict():
_DESTRUCTIVE = set(('__delitem__ __setitem__ clear pop popitem setdefault '

_NON_DESTRUCTIVE = set(('__contains__ __format__ __getitem__ __hash__ '
'__init__ __iter__ __len__ __repr__ __sizeof__ '
'__str__ copy fromkeys get has_key items iteritems '
'iterkeys itervalues keys values'.split()))

_COMPARISONS = set(('__cmp__ __eq__ __ge__ __gt__ __le__ __lt__ '

def __init__(self, iterable=(), **kwargs):
self._dict = dict(iterable, **kwargs)

def __hash__(self):
return hash(tuple(self.items()))

def __getattr__(self, attrib):
class_ = self.__class__
dict_ = self._dict
if attrib in class_._COMPARISONS:
return lambda x: dict_.__getattribute__(attrib)(x._dict)
elif attrib in class_._NON_DESTRUCTIVE:
return dict_.__getattribute__(attrib)
if attrib in class_._DESTRUCTIVE:
raise TypeError('%s' object is not mutable % class_.__name__)
raise AttributeError('%s' object has no attribute '%s' %
 (class_.__name__, attrib))

I didn't implement this as a subclass of dict to avoid having to
write a dumb little blocking method for every destructive dict
method.  (I couldn't figure out how to write a loop to define these
overriding methods programmatically, because their signatures are
all over the place.)

I didn't implement it as a subclass of object with an internal dict
delegate, because I couldn't figure a reasonable way to pass certain
object methods to the delegate (since in this case frozendict.__getattr__
wouldn't be called).

The handling of comparison methods is particularly horrific and

If Beautiful is better than ugly, I sure how there's another way
that is a lot more beautiful than this one.




Re: frozendict (v0.1)

A simple fix is to use hash(frozenset(self.items())) instead.

Thanks for pointing out the hash bug.  It was an oversight: I meant
to write

def __hash__(self):
return hash(sorted(tuple(self.items(

I imagine that frozenset is better than sorted(tuple(...)) here,
but it's not obvious to me why.

At any rate, using your suggestions in this and your other post,
the current implementation of frozendict stands at:

class frozendict(dict):
for method in ('__delitem__ __setitem__ clear pop popitem setdefault '
def %s(self, *a, **k):
cn = self.__class__.__name__
raise TypeError('%%s' object is not mutable %% cn)
 % method

def __hash__(self):
return hash(frozenset(self.items()))

...which is a lot nicer!



hashkey/digest for a complex object

2010-10-06 Thread kj

The short version of this question is: where can I find the algorithm
used by the tuple class's __hash__ method?

Now, for the long version of this question, I'm working with some
complext Python objects that I want to be able to compare for
equality easily.

These objects are non-mutable once they are created, so I would
like to use a two-step comparison for equality, based on the
assumption that I can compute (either at creation time, or as needed
and memoized) a hashkey/digest for each object.  The test for
equality of two of these objects would first compare their hashkeys.
If they are different, the two objects are declared different; if
they match, then a more stringent test for equality is performed.

So the problem is to come up with a reasonable hashkey for each of
these objects.  The objects have two significant attributes, and
two of these objects should be regarded as equal if these attributes
are the same in both.  The first attribute is a simple dictionary
whose keys are integers and values are strings.  The second attribute
is more complicated.  It is a tree structure, represented as a
dictionary of dictionaries of dictionaries... until we get to the
leaf elements, which are frozensets of strings.  The keys at every
level of this structure are strings.  E.g. a simple example of such
an attribute would look like:

{'A': {'a': set(['1', '2', '3']),
   'b': set(['4', '5'])},
 'B': set(['6', '7', '8'])}

I'm looking for a good algorithm for computing a hash key for
something like this?  (Basically I'm looking for a good way to
combine hashkeys.)




How parametrize classes by class data?

2010-10-04 Thread kj

I want to implement a class of classes, so that, instead of the

spam = MyClass(eggs)

...I can write

spam = MyClass(ham)(eggs)

...where ham is a parameter that will end up as the value of a class
variable of the class returned by MyClass(ham).

In other words, MyClass is a metaclass: a class whose instances
are themselves classes.

In the immediate use I have for such a metaclass, the parameter is
going to be a list of lists of headers, which is used by the __init__
of the generated class to interpret its inputs.

The standard library's collections.namedtuple needs to do something
similar to this, so I thought I could learn how to do it in Python
by studying its source code.  I was surprised to discover that
collections.namedtuple achieves this class parametrization by
generating some source code on the fly, from a template, and exec'ing

This looked to me like a rather un-Pythonic hack, but seeing there
in the venerable collections module suggested to me that maybe this
is actually the best way to achieve this effect in Python.  Is this
so?  If not, please let me know of a better way.



namespace hacking question

2010-09-30 Thread kj

This is a recurrent situation: I want to initialize a whole bunch
of local variables in a uniform way, but after initialization, I
need to do different things with the various variables.

What I end up doing is using a dict:

d = dict()
for v in ('spam', 'ham', 'eggs'):
d[v] = init(v)


This is fine, but I'd like to get rid of the tedium of typing all
those extra d['...']s.

I.e., what I would *like* to do is something closer to this:

d = locals()
for v in ('spam', 'ham', 'eggs'):
d[v] = init(v)


...but this results in errors like NameError: global name 'spam' is
not defined.

But the problem is deeper than the fact that the error above would
suggest, because even this fails:

spam = ham = eggs = None
d = locals()
for v in ('spam', 'ham', 'eggs'):
d[v] = init(v)

foo(spam) # calls foo(None)
bar(ham)  # calls bar(None)
baz(eggs) # calls baz(None)

In other words, setting the value of locals()['x'] does not set
the value of the local variable x.

I also tried a hack using eval:

for v in ('spam', 'ham', 'eggs'):
eval %s = init('%s') % (v, v)

but the = sign in the eval string resulted in a SyntaxError:
invalid syntax.

Is there any way to use a loop to set a whole bunch of local
variables (and later refer to these variables by their individual




Supplementing the std lib (Was: partial sums problem)

2010-09-29 Thread kj
In mailman.1142.1285722789.29448.python-l...@python.org Terry Reedy 
tjre...@udel.edu writes:

Do not try to do a reduction with a comprehension. Just write clear, 
straightforward code that obviously works.

def cusum(s):
   t = 0
   for i in s:
   t += i
   yield t

[1, 3, 6, 10, 15, 21]

Actually, this is just fine.  Thank you!

But it brings up a new question, of an entirely different nature.

It's a recurrent conundrum, actually, and I have not found a good

Your cusum function is one that I would like to see somewhere in
the standard library (in itertools maybe?).  Maybe some future
version of the standard library will have it, or something like it
(I'm thinking of a generalized form which, like reduce, takes a
function and an initial value as arguments).

But in the immediate term, cusum is not part of the standard library.

Where would you put it if you wanted to reuse it?  Do you create
a module just for it?  Or do you create a general stdlib2 module
with all those workhorse functions that have not made it to the
standard library?  Or something else entirely?

(I'm not expecting to get *the* solution from anyone reply; rather,
I'm interested in reading people's take on the question and their
way of dealing with those functions they consider worthy of the
standard library.)



partial sums problem

2010-09-28 Thread kj

The following attempt to get a list of partial sums fails:

 s = 0
 [((s += t) and s) for t in range(1, 10)]
  File stdin, line 1
[((s += t) and s) for t in range(1, 10)]
SyntaxError: invalid syntax

What's the best way to get a list of partial sums?



audio time-stretching?

2010-09-07 Thread kj

Does anyone know of a Python module for *moderate* time-stretching[1]
an MP3 (or AIFF) file?

FWIW, the audio I want to time-stretch is human speech.



[1] By moderate time stretching I mean, for example, taking an
audio that would normally play in 5 seconds, and stretch it so that
it plays in 7.5 seconds, keeping the pitch unchanged.  A lot of
software out there does this badly; e.g. the time-stretched audio
springs extraneous beats of intensity that are very obtrusive
and annoying; I guess it's some weird wave self-interference effect.
Also, I stress *moderate* time stretching to explicitly rule out
the extreme (~50X) time-stretching that software like PaulStretch
is designed to accomplish. 

Re: How to convert (unicode) text to image?

On Fri, Aug 27, 2010 at 8:01 PM, kj no.em...@please.post wrote:

 Hi! =A0Does anyone know of an easy way to convert a Unicode string into a=
n image file (either jpg or png)?

Do you mean you have some text and you want an image containing that
text? PIL's ImageDraw module can do that.

Thanks for the pointer, but...

The documentation I have found for PIL (at
http://www.pythonware.com/library/pil/handbook) is beyond atrocious.
If this is the only way to learn how to use this library, then I
really don't understand how anyone who is not clairvoyant can do it.

Example: I went to the docs page for ImageDraw.  There I find that
the constructor for an ImageDraw.Draw object takes an argument,
but *what* this argument should be (integer? object? string?) is
left entirely undefined.  From the examples given I *guessed* that
it was an object of class Image, so I repeated the exercise: I
consulted the docs for the Image module.  There I learn that the
constructor for the Image class takes among its parameters one
called mode and one called color, but, here again, what these
parameters are is left completely undefined.  (mode is left both
syntactically and semantically undefined; color is left syntactically
undefined, though the documentation includes a bit by way of semantic
definition of this parameter.)

What's up with this practice of leaving parameters undefined like
this???  Wasn't it obvious to the person writing the Image module
docs that without explaining what these parameters should be the
documentation is nearly useless?  Is such poor documentation an
unintended consequence of duck typing???

Sorry for the outburst, but unfortunately, PIL is not alone in
this.  Python is awash in poor documentation.

The number two complaint I've heard from those who dislike Python
is the poor quality of its documentation, and in particular the
fact that function parameters are typically left undefined, as is
the case in the PIL docs.  I like Python a lot, but I have to agree
with this criticism.  (The number one complaint has to do with the
syntactic significance of whitespace; of course, I find *this*
criticism silly.)

What is most frustrating about such poor documentation is that it
is exactly the opposite from what one would expect from the
carefulness and thoroughness found in the PEPs...

I have been using Python as my primary scripting language for about
one year, after many years of programming in Perl, and now Python
is my language of choice.  But I must say that the documentation
standards I found in the Perl world are *well above* those in the
Python world.  This is not to say that Perl documentation is always
excellent; it certainly has its gaps, as one would expect from
volunteer-contributed software.  But I don't recall being frustrated
by Perl module docs anywhere nearly as often as I am by Python
module docs.  I have to conclude that the problem with Python docs
is somehow systemic...



How to convert (unicode) text to image?

2010-08-27 Thread kj

Hi!  Does anyone know of an easy way to convert a Unicode string into an image 
file (either jpg or png)?




Re: shelf-like list?

On Sat, Aug 14, 2010 at 5:13 PM, kj no.em...@please.post wrote:
 In af7fdb85-8c87-434e-94f3-18d8729bf...@l25g2000prn.googlegroups.com Ra=
ymond Hettinger pyt...@rcn.com writes:
On Aug 12, 1:37=3DA0pm, Thomas Jollans tho...@jollybox.de wrote:
 On Tuesday 10 August 2010, it occurred to kj to exclaim:

  I'm looking for a module that implements persistent lists: objects
  that behave like lists except that all their elements are stored
  on disk. =3DA0IOW, the equivalent of shelves, but for lists rather
  than a dictionaries.
 . . .
 You could simply use pickle to save the data every once in a while.

That is a very reasonable solution.

 Sorry I don't follow. =C2=A0Some sample code would be helpful.

I would assume something along the lines of (untested):

from pickle import dump


class PersistentList(list):
def __init__(self, filepath):
self.filepath =3D filepath
self._mod_count =3D 0

def save(self):
with open(self.filepath, 'w') as f:
dump(self, f)
self._mod_count =3D 0

def append(self, *args, **kwds):
super(PersistentList, self).append(*args, **kwds)
self._mod_count +=3D 1
if self._mod_count =3D MOD_THRESHOLD:
# obviously time-since-last-dump or other
#   more sophisticated metrics might be used instead

Even though it is saved periodically to disk, it looks like the
whole list remains in memory all the time?  (If so, it's not what
I'm looking for; the whole point of saving stuff to disk is to keep
the list's memory footprint low.)


Re: shelf-like list?

 Does anyone know of such a module? 

ZODB supports persistent lists.

Thanks; I'll check it out.


Re: shelf-like list?

2010-08-14 Thread kj
In af7fdb85-8c87-434e-94f3-18d8729bf...@l25g2000prn.googlegroups.com Raymond 
Hettinger pyt...@rcn.com writes:

On Aug 12, 1:37=A0pm, Thomas Jollans tho...@jollybox.de wrote:
 On Tuesday 10 August 2010, it occurred to kj to exclaim:

  I'm looking for a module that implements persistent lists: objects
  that behave like lists except that all their elements are stored
  on disk. =A0IOW, the equivalent of shelves, but for lists rather
  than a dictionaries.
 . . .
 You could simply use pickle to save the data every once in a while.

That is a very reasonable solution.

Sorry I don't follow.  Some sample code would be helpful.


How to add silent stretches to MP3 using Python?

2010-08-14 Thread kj

Here's the problem: I have about 25,000 mp3 files, each lasting,
*on average*, only a few seconds, though the variance is wide (the
longest one lasts around 20 seconds).  (These files correspond to
sample sentences for foreign language training.)

The problem is that there is basically no padding before and after
the sound signal.  I want to prepend about 2 seconds of silence to
each file, and append another silent stretch at the end lasting
either 2 seconds or some multiplier of the duration of the original
file, whichever is greater.

I know that manipulating MP3 audio programmatically is usually not
easy, but this has got to be one of the simplest manipulations
possible, so I'm hoping I'll be able to pull it off with Python.

But I have not had much luck finding a Python library to do this.
If someone knows how to do this, and could give me some pointers,
I'd appreciate it.



shelf-like list?

2010-08-10 Thread kj

I'm looking for a module that implements persistent lists: objects
that behave like lists except that all their elements are stored
on disk.  IOW, the equivalent of shelves, but for lists rather
than a dictionaries.

Does anyone know of such a module? 

(I suppose that I could slap together a crude implementation of
such a thing by wrapping a shelf with suitable methods to simulate
the list interface.  But I'd rather not roll my own if a tested
implementation already exist.)



Re: Unicode error

On Sat, 07 Aug 2010 19:28:56 +1200, Gregory Ewing wrote:

 Steven D'Aprano wrote:
 No memory?  No disk space?  No problem! Just a flesh wound!  What's
 the point of that?
 +1 QOTW

While I'm always happy to be nominated for QOTW, in this case I didn't 
say it, and the nomination should go to KJ.

(The ol' insert Monty Python reference move: it never fails...) 

Re: Unicode error

2010-08-06 Thread kj
In pan.2010. Nobody nob...@nowhere.com 

On Fri, 23 Jul 2010 10:42:26 +, Steven D'Aprano wrote:

 Don't write bare excepts, always catch the error you want and nothing 

That advice would make more sense if it was possible to know which
exceptions could be raised. In practice, that isn't possible, as the
documentation seldom provides this information. Even for the built-in
classes, the documentation is weak in this regard; for less important
modules and third-party libraries, it's entirely absent.

I don't get your point.  Even when I *know* that a certain exception
may happen, I don't necessarily catch it.  I catch only those
exceptions for which I can think of a suitable response that is
*different* from just letting the program fail.  (After all, my
own code raises its own exceptions with the precise intention of
making the program fail.)  If an unexpected exception occurs, then
by definition, I had no better response in mind for that situation
than just letting the program fail, so I'm happy to let that happen.
If, afterwards, I think of a different response for a previously
uncaught exception, I'll modify the code accordingly.

I find this approach far preferable to the alternative of knowing
a long list of possible exceptions (some of which may never happen
in actual practice), and think of ways to keep the program still
alive no-matter-what.  No memory?  No disk space?  No problem!
Just a flesh wound!  What's the point of that?

(If I want the final error message to be something other than a
bare stack trace, I may wrap the whole execution in a global/top-level
try/catch block so that I can fashion a suitable error message
right before calling exit, but that's just softening the fall:
the program still will go down.)


XML parsing: SAX/expat yield

2010-08-04 Thread kj

I want to write code that parses a file that is far bigger than
the amount of memory I can count on.  Therefore, I want to stay as
far away as possible from anything that produces a memory-resident
DOM tree.

The top-level structure of this xml is very simple: it's just a
very long list of records.  All the complexity of the data is at
the level of the individual records, but these records are tiny in
size (relative to the size of the entire file).

So the ideal would be a parser-iterator, which parses just enough
of the file to yield (in the generator sense) the next record,
thereby returning control to the caller; the caller can process
the record, delete it from memory, and return control to the
parser-iterator; once parser-iterator regains control, it repeats
this sequence starting where it left off.

The problem, as I see it, is that SAX-type parsers like expat want
to do everything with callbacks, which is not readily compatible
with the generator paradigm I just described.

Is there a way to get an xml.parsers.expat parser (or any other
SAX-type parser) to stop at a particular point to yield a value?

The only approach I can think of is to have the appropriate parser
callbacks throw an exception wherever a yield would have been.
The exception-handling code would have the actual yield statement,
followed by code that restarts the parser where it left off.
Additional logic would be necessary to implement the piecemeal
reading of the input file into memory.

But I'm not very conversant with SAX parsers, and even less with
generators, so all this may be unnecessary, or way off.

Any other tricks/suggestions for turning a SAX parsers into a
generator, please let me know.


Re: XML parsing: SAX/expat yield

How about






how to pretty-print Python dict with unicode?

2010-08-04 Thread kj

Is there a simple way to get Python to pretty-print a dict whose
values contain Unicode?  (Of course, the goal here is that these
printed values are human-readable.)

If I run the following simple script:

from pprint import pprint
x = u'\u6c17\u304c\u9055\u3046'
print '{%s: %s}' % (u'x', x)
print {u'x': x}
pprint({u'x': x})

The first print statement produces perfectly readable Japanese,
but the remaining statements both produce the line

{u'x': u'\u6c17\u304c\u9055\u3046'}

I've tried everything I can think of (including a lot of crazy
stuff) short of re-writing pprint from scratch (which I think would
be faster than grokking it and hacking at it).

Isn't there an easier way to do this?



Re: ctypes' c_longdouble: underflow error (bug?)

On 07/15/2010 06:41 PM, kj wrote:
 In mailman.733.1279124991.1673.python-l...@python.org Thomas Jollans 
 tho...@jollans.com writes:
 c_longdouble maps to float
 Thanks for pointing this out!
 (Does it make *any difference at all* to use c_longdouble instead
 of c_double?  If not, I wonder what's the point of having c_longdouble
 at all.)

they're still different types (in C).

i understand that doubles and long doubles are different in C.

on my machine, a double is 64 bits wide, while a long double is 129 bits
wide. This means that a function that expects a long double argument
will expect 16 bytes, but ctypes will only pass 8 bytes if you tell it
to pass double. The same applies to return values.

This is extremely confusing.  From my naive reading of the
documentation, I would have expected that the following two blocks
would produce identical results (expl is one of the standard C math
library exponential functions, with signature long double expl(long

MATH.expl.argtypes = [c_longdouble]
MATH.expl.restype = c_longdouble
print MATH.expl(0)

MATH.expl.argtypes = [c_double]
MATH.expl.restype = c_double
print MATH.expl(0)

...but no, they don't: the first one prints out the correct result,
1.0, while the second one prints out 0.0, of all things.  (In fact,
with the second (mis)configuration, the value returned by MATH.expl
is always equal to its argument, go figure.)

I find these results perplexing because, based on the docs, I
expected that they *both* would be analogous to doing the following
in C:

printf(%f\n, (double) expl((double) 0.0)); /* prints out 1.00 */

i.e., in *both* cases, expl would get passed a double (which gets
automatically cast into a long double), and in both cases its
returned value would be cast into a double.  Clearly, this is not
what's happening, but I can't figure out the correct interpreation
of what's going on based on the documentation...

Re: ctypes' c_longdouble: underflow error (bug?)

c_longdouble maps to float

Thanks for pointing this out!
(Does it make *any difference at all* to use c_longdouble instead
of c_double?  If not, I wonder what's the point of having c_longdouble
at all.)

Q for Emacs users: code-folding (hideshow)

2010-07-15 Thread kj

This is a question _for Emacs users_ (the rest of you, go away :)  ).

How do you do Python code-folding in Emacs?



ctypes' c_longdouble: underflow error (bug?)

2010-07-14 Thread kj

I have a C library function hg that returns a long double, so when
I import it using C types I specify this return type like this:

MYLIB.hg.restype = ctypes.c_longdouble

But certain non-zero values returned by hg appear as zero Python-side.
If I modify hg so that it prints out its value right before returning
it, I get stuff like the following:

 0 == MYLIB.hg(100, 200, 100, 6000)
from hg: 2.96517e-161 
 0 == MYLIB.hg(200, 200, 200, 6000)
from hg: 5.28791e-380

So, although the value returned by hg in the second invocation
above is 5.28791e-380, Python sees it as 0.

What am I doing wrong?



Numerics question

2010-07-02 Thread kj

I define

ninv = 1.0/n

...where n is some integer, and I want to write some function f such
that f(m * ninv) returns the smallest integer that is = m * ninv,
where m is some other integer.  And, in particular, if m is p*n
for some integer p, then f((p*n) * ninv) should return the integer

The first solution that comes to mind is something like

def f(x):
return int(math.ceil(x))

At first this seems to work:

 f((7*2) * (1.0/2))
 f((7*3) * (1.0/3))

...but there are values of n for which it fails:

 f((7*75) * (1.0/75))

The problem here is that, due to numerical error, the expression
((7*75) * (1.0/75)) evaluates to a number *just* above 7.  The
surrounding math.ceil then turns this into 8.0, etc.

Is there a way to define f so that it behaves as expected?



Re: Numerics question

Please disregard my ineptly posed question.


I define

ninv = 1.0/n

...where n is some integer, and I want to write some function f such
that f(m * ninv) returns the smallest integer that is = m * ninv,
where m is some other integer.  And, in particular, if m is p*n
for some integer p, then f((p*n) * ninv) should return the integer

The first solution that comes to mind is something like

def f(x):
return int(math.ceil(x))

At first this seems to work:

 f((7*2) * (1.0/2))
 f((7*3) * (1.0/3))

...but there are values of n for which it fails:

 f((7*75) * (1.0/75))

The problem here is that, due to numerical error, the expression
((7*75) * (1.0/75)) evaluates to a number *just* above 7.  The
surrounding math.ceil then turns this into 8.0, etc.

Is there a way to define f so that it behaves as expected?



tallying occurrences in list

2010-06-04 Thread kj

Task: given a list, produce a tally of all the distinct items in
the list (for some suitable notion of distinct).

Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
'c', 'a'], then the desired tally would look something like this:

[('a', 4), ('b', 3), ('c', 3)]

I find myself needing this simple operation so often that I wonder:

1. is there a standard name for it?
2. is there already a function to do it somewhere in the Python
   standard library?

Granted, as long as the list consists only of items that can be
used as dictionary keys (and Python's equality test for hashkeys
agrees with the desired notion of distinctness for the tallying),
then the following does the job passably well:

def tally(c):
t = dict()
for x in c:
t[x] = t.get(x, 0) + 1
return sorted(t.items(), key=lambda x: (-x[1], x[0]))

But, of course, if a standard library solution exists it would be
preferable.  Otherwise I either cut-and-paste the above every time
I need it, or I create a module just for it.  (I don't like either
of these, though I suppose that the latter is much better than the

So anyway, I thought I'd ask. :)


Re: tallying occurrences in list

Thank you all!


atexit/signal for non-interactive jobs

2010-05-26 Thread kj

I want to implement clean-up functions for scripts to be run on a
Linux cluster (through LSF).  The goal is to make sure that a
minimal wrap-up sequence (print diagnostic info, flush buffers,
etc.) gets executed if the job is terminated for some reason.  (The
most common reason for premature termination is a SIGUSR2 signal,
sent to the process by LSF when the job has taken longer than the
time limit for the job's LSF queue.)

No matter what I try, I can't get this wrap-up sequence to work
when the script runs non-interactively.

The latest I have looks something like this:

import signal
import atexit

def set_handlers(handler):
for sig in dir(signal):
if sig.startswith('SIG') and not '_' in sig:
signal(getattr(signal, sig), handler)

def wrapup(*args):
# etc, etc

def main():
# ... 


If I start the script interactively and after a few seconds (i.e.
before it terminates) I hit Ctrl-C (which sends a TERM signal to
the process), the wrapup function gets called as desired (although
this action appears to be triggered by atexit, and not by the
handler for SIGTERM, which apparently never gets called).

But if instead I background the script after starting it, and then
use kill to explicitly send a TERM signal to its process, the script
terminates, but the wrapup code does not execute.

It makes no sense to me.  If someone cares to explain it, I'd be
very grateful.  (Is my code doing something wrong?)

But more to the point, what must I do to get this to work in the
non-interactive case?



Re: Limitation of os.walk

 If os.walk were rewritten, it should be as an iterator (generator).
 Directory entry and exit functions could still be added as params.

It *is* an iterator/generator.  However, I suspect you mean that 
it should slurp the dirs/files iteratively instead of using 
listdir() as was discussed on c.l.p a few months back.

Thanks for mentioning this thread.  Very interesting stuff.  Apropos
the implementability of an iterative listdir, I wonder if some
variation of glob.iglob() would fit the bill.  (Maybe it's too
slow, though.)

I suspect if I thought about it much longer, only one would 
really be needed, the other accommodated by the topdown parameter.

Yeah, I think one only needs a post hook.  The fact that it's a
generator obviates need for a pre hook, since the yield returns
control to the calling function right around where the pre-hook
would run anyway.  For the same reason, the post hook is needed
only for the case topdown=True.

Another useful callback would be one that replaced listdir in the
generation of the current directory's contents.


Re: Limitation of os.walk

On 5/11/2010 3:49 PM, kj wrote:
 PS: I never understood why os.walk does not support hooks for key
 events during such a tree traversal.

Either 1) it is intentionally simple, with the expectation that people 
would write there own code for more complicated uses or 2) no one has 
submitted a 'full-featured' version or 3) both.

I hope it's not (1): I want the language I use to include more
batteries not fewer.

petpeeveIt seems that a similar simplicity argument was invoked
to strip the cmp option from sort in Python 3.  G.  Simplicity
is great, but when the drive for it starts causing useful functionality
to be thrown out, then it is going too far.  Yes, I know that it
is possible to do everything that sort's cmp option does through
clever tricks with key, but grokking and coding this maneuver
requires a *lot* more Python-fu than cmp did, which makes this
functionality a lot less accessible to beginners that the intrinsic
complexity of the problem warrants.  And for what?  To get rid of
an *option* that could be easily disregarded by anyone who found
it too complex? It makes no sense to me./petpeeve

Limitation of os.walk

2010-05-11 Thread kj

I want implement a function that walks through a directory tree
and performs an analsysis of all the subdirectories found.  The
task has two essential requirements that, AFAICT, make it impossible
to use os.walk for this:

1. I need to be able to prune certain directories from being visited.

2. The analysis on each directory can be performed only after it
   has been performed on all its subdirectories.

Unless I'm missing something, to do (1), os.walk must be run with
topdown=True, whereas to do (2) it must be run with topdown=False.

Is there a work around that I'm missing?



PS: I never understood why os.walk does not support hooks for key
events during such a tree traversal.

Re: Limitation of os.walk

That said, the core source for os.walk() is a whole 23 
lines of code, it's easy enough to just clone it and add what you 

Thanks, that was a good idea.


Inheritable computed class attributes?

2010-04-30 Thread kj

I want to define a class attribute that is computed from other
class attributes.  Furthermore, this attribute should be inheritable,
and its value in the subclasses should reflect the subclasses values
of the attributes used to compute the computed attribute.  I tried
the following:

class Spam(object):
X = 3
def Y(cls):
return cls.X * 3 

...but Spam.Y returns property object at 0x.., rather than 9.

How can I define a class property?  Is it possible at all?

Ultimately, I'd like to be able to define multiple subclasses of
Spam, e.g.

class Ham(Spam):
X = 7
class Eggs(Spam):
X = '.' 

and have Ham.Y and Eggs.Y evaluate to 21 and '...', respectively.



Re: Inheritable computed class attributes?

In 4bdb4e4...@dnews.tpgi.com.au Lie Ryan lie.1...@gmail.com writes:

class MetaSpam(type):
def Y(cls):
return cls.X * 3

class Spam(object):
__metaclass__ = MetaSpam

and there we go:

 class Ham(Spam):
... X = 7
 class Eggs(Spam):
... X = '.'
 Ham.Y; Eggs.Y


That's very interesting!  I did not know about metaclasses; I need
to learn more about them.  Thanks for the pointer!


Looking for registration package

2010-04-21 Thread kj

I'm looking for a Python-based, small, self-contained package to
hand out API keys, in the same spirit as Google API keys.

The basic specs are simple: 1) enforce the one key per customer
rule; 2) be robot-proof; 3) be reasonably difficult to circumvent
even for humans.

(This is for a web service we would like to implement; the goal is
to be able to control the load on our servers.  Therefore, if the
package includes an automated log-analysis component, all the
better, but this is not necessary.)

Any suggestions would be appreciated.



Re: How to access args as a list?

On Sat, 03 Apr 2010 22:58:43 +, kj wrote:

 Suppose I have a function with the following signature:
 def spam(x, y, z):
 # etc.
 Is there a way to refer, within the function, to all its arguments as a
 single list?  (I.e. I'm looking for Python's equivalent of Perl's @_

Does this help?

 def spam(a, b, c=3, d=4):
... pass
('a', 'b', 'c', 'd')

That's very handy.  Thanks!

The hardest part is having the function know its own name.

Indeed.  Why Python does not provide this elmentary form of
introspection as a built-in variable is extremely puzzling to me
(even--no, *more so*--after reading PEP 3130).

I see that you are already using the inspect module. That almost 
certainly is the correct approach. I'd be surprised if inspect is too 
heavyweight, but if it is, you can pull out the bits you need into your 
own function.

That's a good idea.  Thanks!


How to access args as a list?

2010-04-03 Thread kj

Suppose I have a function with the following signature:

def spam(x, y, z):
# etc.

Is there a way to refer, within the function, to all its arguments
as a single list?  (I.e. I'm looking for Python's equivalent of
Perl's @_ variable.)

I'm aware of locals(), but I want to preserve the order in which
the arguments appear in the signature.

My immediate aim is to set up a simple class that will allow me to
iterate over the arguments passed to the constructor (plus let me
refer to these individual arguments by their names using an
instance.attribute syntax, as usual).

The best I have managed looks like this:

class _Spam(object):
def __init__(self, x, y, z): 
self.__dict__ = OrderedDict(())
for p in inspect.getargspec(_Spam.__init__).args[1:]:
self.__dict__[p] = locals()[p]

def __iter__(self):
return iter(self.__dict__.values())

but rolling out inspect.getargspec for this sort of thing looks to
me like overkill.  Is there a more basic approach?

P.S. this is just an example; the function I want to implement has
more parameters in its signature, with longer, more informative


Re: How to access args as a list?

Suppose I have a function with the following signature:

def spam(x, y, z):
# etc.

Is there a way to refer, within the function, to all its arguments
as a single list?  (I.e. I'm looking for Python's equivalent of
Perl's @_ variable.)

I'm aware of locals(), but I want to preserve the order in which
the arguments appear in the signature.

My immediate aim is to set up a simple class that will allow me to
iterate over the arguments passed to the constructor (plus letS me
refer to these individual arguments by their names using an
instance.attribute syntax, as usual).

The underlined portion explains why __init__(self, *args) fails to
fit the bill.

P.S. this is just an example; the function I want to implement has
more parameters in its signature, with longer, more informative

Andreas, perhaps this paragraph explains why I find your solution
unappealing:  it requires typing the same thing over and over,
which increases the chances of bugs.  That's the reason I avoid
such repetitiveness, not laziness, as you so were so quick to accuse
me of.


Re: How to access args as a list?

In hp8h73$k1...@reader1.panix.com kj no.em...@please.post writes:

Suppose I have a function with the following signature:

def spam(x, y, z):
# etc.

Is there a way to refer, within the function, to all its arguments
as a single list?  (I.e. I'm looking for Python's equivalent of
Perl's @_ variable.)

I'm aware of locals(), but I want to preserve the order in which
the arguments appear in the signature.

My immediate aim is to set up a simple class that will allow me to
iterate over the arguments passed to the constructor (plus letS me
refer to these individual arguments by their names using an
instance.attribute syntax, as usual).

The underlined portion explains why __init__(self, *args) fails to
fit the bill.

The minute I hit send I realized that this is wrong.  Sorry.



  1   2   3   4   >