At 11:52 AM 7/21/2011 +1000, Nick Coghlan wrote:
Trying to change how packages are identified at the Python level makes
PEP 382 sound positively appealing. __path__ needs to stay :)

In which case, it should be a list, not a sentinel.  ;-)


Even better would be for these (and sys.path) to be list subclasses
that did the right thing under the hood as Glenn suggested. Code that
*replaces* rather than modifies these attributes would still
potentially break virtual packages, but code that modifies them in
place would do the right thing automatically. (Note that all code that
manipulates sys.path and __path__ attributes requires explicit calls
to correctly support current namespace package mechanisms, so this
would actually be an improvement on the status quo rather than making
anything worse).

I think the simplest thing, if we're keeping __path__ (and on reflection, I think we should), would be to simply call extend_virtual_paths() automatically on new path entries found in sys.path when an import is performed, relative to the previous value of sys.path.

That is, we save an "old" copy of sys.path somewhere, and whenever __import__() is called (well, once it gets past checking if the target is already in sys.modules, anyway), it checks the current sys.path against it, and calls extend_virtual_paths() on any sys.path entries that weren't in the "old" sys.path.

This is not the most efficient thing in the world, as it will cause a bunch of stat calls to happen against the new directories, in the middle of a possibly-entirely-unrelated import operation, but it would certainly address the issue in the Simplest Way That Could Possibly Work.

A stricter (safer) version of the same thing would be one where we only update __path__ values that are unchanged since we created them, and rather than only appending new entries, we replace the __path__ with a newly-computed one.

This version is safer because it avoids corner cases like "I imported foo.bar while foo.baz 1.1 was on my path, then I prepended a directory to sys.path that has foo.baz 1.2, but I still get foo.baz 1.1 when I import." But it loses in cases where people do direct __path__ manipulation.

On the other hand, it's a lot easier to say "you break it, you bought it" where __path__ manipulation is concerned, so I'm actually pretty inclined towards using the strict version.

Hey... here's a crazy idea. Suppose that a virtual package __path__ is a *tuple* instead of a list? Now, in order to change it, you *have* to replace it. And we can cache the tuple we initially set it to in sys.virtual_package_paths, so we can do an 'is' check before replacing it.

Voila: __path__ still exists and is still a sequence for a virtual path, but you have to explicitly replace it if you want to do anything funky -- at which point you're responsible for maintaining it.

I'm tempted to say, "well, why not use a list-subclass proxy, then?", but that means more work for no real difference. I just went through dozens of examples of __path__ usage (found via Google), and I found exactly two examples of code that modifies a __path__ that is not:

1. In the __init__.py whose __path__ it is (i.e., code that'll still have a list), or 2. Modifying the __path__ of an explicitly-named self-contained package that's part of the same distribution.

The two examples are from Twisted, and Google AppEngine. In the Twisted case, it's some sort of namespace package-like plugin chicanery, and in the AppEngine case, well, I'm not sure what the heck it's doing, but it seems to be making sure that you can still import stuff that has the same name as stdlib stuff, or something.

The Twisted case (and an apparent copy of the same code in a project called "flumotion") uses ihooks, though, so I'm not sure it'll even get executed for virtual packages. The Google case loops over everything in sys.modules, in a function by the name of appengine.dist.fix_paths()... but I wasn't able to find out who calls this function, when and why.

So, pretty much, except for these bits of "nosy" code, the vast majority of code out there seems to only mess with its own self-contained paths, making the use of tuples seem like a pretty safe choice.

(Oh, and all the code I found that reads paths without modifying them only use tuple-safe operations.)

So, if we implement automatic __path__ updates for virtual packages, I'm currently leaning towards the strict approach using tuples, but could possibly be persuaded towards read-only list-proxies instead.

Side note: it looks like a *lot* of code out there abuses __path__[0] to find data files, so I probably need to add a note to the PEP about not doing that when you convert a self-contained package to a virtual one. Of course, I suppose using a sentinel could address *that* problem, or an iteration-only proxy.

The main concern here is that using __path__[0] will *seem* to work when you first use it with a virtual package, because it'll be the right directory. But it'll be wrong long-term.

This seems to lean in favor of making a simple reiterable wrapper type for the __path__, that only allows you to take the length and iterate over it. With an appropriate design, it could actually update itself automatically, given a subname and a parent __path__/sys.path. That is, it could keep a tuple copy of the last-seen parent path, and before iteration, compare tuple(self.parent_path) to self.last_seen_path. If they're different, it rebuilds the value to be iterated over.

Voila: transparent updating of all virtual __path__ values from sys.path changes (or modifications to self-contained __path__ parents, btw), and trying to change it (or read an item from it positionally) will not create any silent failures.

Alright... *if* we support automatic updates to virtual __paths__, this is probably how we should do it. (It will require, though, that imp.find_module be changed to use a different iteration method than PyList_GetItem, as it's quite possible a virtual __path__ will get passed into it.)

Also, we *long* ago passed the point where any of this can be sanely backported to Python 2.x with a simple shim, alas. For my purposes at least, needing a full importlib for the implementation is a no-go. :-( Still, for the future of Python, this all makes good sense. I just wish we'd thought of all this in 2006 when the discussion came up before: we maybe could've had this in Python 2.6. Where's that damn time machine when you *really* need it? ;-)

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to