Re: [Python-Dev] Draft PEP: "Simplified Package Layout and Partitioning"

P.J. Eby Thu, 21 Jul 2011 06:27:25 -0700

At 11:52 AM 7/21/2011 +1000, Nick Coghlan wrote:

Trying to change how packages are identified at the Python level makes
PEP 382 sound positively appealing. __path__ needs to stay :)


In which case, it should be a list, not a sentinel.  ;-)

Even better would be for these (and sys.path) to be list subclasses
that did the right thing under the hood as Glenn suggested. Code that
*replaces* rather than modifies these attributes would still
potentially break virtual packages, but code that modifies them in
place would do the right thing automatically. (Note that all code that
manipulates sys.path and __path__ attributes requires explicit calls
to correctly support current namespace package mechanisms, so this
would actually be an improvement on the status quo rather than making
anything worse).

I think the simplest thing, if we're keeping __path__ (and onreflection, I think we should), would be to simply callextend_virtual_paths() automatically on new path entries found insys.path when an import is performed, relative to the previous valueof sys.path.

That is, we save an "old" copy of sys.path somewhere, and whenever__import__() is called (well, once it gets past checking if thetarget is already in sys.modules, anyway), it checks the currentsys.path against it, and calls extend_virtual_paths() on any sys.pathentries that weren't in the "old" sys.path.

This is not the most efficient thing in the world, as it will cause abunch of stat calls to happen against the new directories, in themiddle of a possibly-entirely-unrelated import operation, but itwould certainly address the issue in the Simplest Way That Could Possibly Work.

A stricter (safer) version of the same thing would be one where weonly update __path__ values that are unchanged since we created them,and rather than only appending new entries, we replace the __path__with a newly-computed one.

This version is safer because it avoids corner cases like "I importedfoo.bar while foo.baz 1.1 was on my path, then I prepended adirectory to sys.path that has foo.baz 1.2, but I still get foo.baz1.1 when I import." But it loses in cases where people do direct__path__ manipulation.

On the other hand, it's a lot easier to say "you break it, you boughtit" where __path__ manipulation is concerned, so I'm actually prettyinclined towards using the strict version.

Hey... here's a crazy idea. Suppose that a virtual package __path__is a *tuple* instead of a list? Now, in order to change it, you*have* to replace it. And we can cache the tuple we initially set itto in sys.virtual_package_paths, so we can do an 'is' check beforereplacing it.

Voila: __path__ still exists and is still a sequence for a virtualpath, but you have to explicitly replace it if you want to doanything funky -- at which point you're responsible for maintaining it.

I'm tempted to say, "well, why not use a list-subclass proxy, then?",but that means more work for no real difference. I just went throughdozens of examples of __path__ usage (found via Google), and I foundexactly two examples of code that modifies a __path__ that is not:

1. In the __init__.py whose __path__ it is (i.e., code that'll stillhave a list), or2. Modifying the __path__ of an explicitly-named self-containedpackage that's part of the same distribution.

The two examples are from Twisted, and Google AppEngine. In theTwisted case, it's some sort of namespace package-like pluginchicanery, and in the AppEngine case, well, I'm not sure what theheck it's doing, but it seems to be making sure that you can stillimport stuff that has the same name as stdlib stuff, or something.

The Twisted case (and an apparent copy of the same code in a projectcalled "flumotion") uses ihooks, though, so I'm not sure it'll evenget executed for virtual packages. The Google case loops overeverything in sys.modules, in a function by the name ofappengine.dist.fix_paths()... but I wasn't able to find out whocalls this function, when and why.

So, pretty much, except for these bits of "nosy" code, the vastmajority of code out there seems to only mess with its ownself-contained paths, making the use of tuples seem like a pretty safe choice.

(Oh, and all the code I found that reads paths without modifying themonly use tuple-safe operations.)

So, if we implement automatic __path__ updates for virtual packages,I'm currently leaning towards the strict approach using tuples, butcould possibly be persuaded towards read-only list-proxies instead.

Side note: it looks like a *lot* of code out there abuses __path__[0]to find data files, so I probably need to add a note to the PEP aboutnot doing that when you convert a self-contained package to a virtualone. Of course, I suppose using a sentinel could address *that*problem, or an iteration-only proxy.

The main concern here is that using __path__[0] will *seem* to workwhen you first use it with a virtual package, because it'll be theright directory. But it'll be wrong long-term.

This seems to lean in favor of making a simple reiterable wrappertype for the __path__, that only allows you to take the length anditerate over it. With an appropriate design, it could actuallyupdate itself automatically, given a subname and a parent__path__/sys.path. That is, it could keep a tuple copy of thelast-seen parent path, and before iteration, comparetuple(self.parent_path) to self.last_seen_path. If they'redifferent, it rebuilds the value to be iterated over.

Voila: transparent updating of all virtual __path__ values fromsys.path changes (or modifications to self-contained __path__parents, btw), and trying to change it (or read an item from itpositionally) will not create any silent failures.

Alright... *if* we support automatic updates to virtual __paths__,this is probably how we should do it. (It will require, though, thatimp.find_module be changed to use a different iteration method thanPyList_GetItem, as it's quite possible a virtual __path__ will getpassed into it.)

Also, we *long* ago passed the point where any of this can be sanelybackported to Python 2.x with a simple shim, alas. For my purposesat least, needing a full importlib for the implementation is ano-go. :-( Still, for the future of Python, this all makes goodsense. I just wish we'd thought of all this in 2006 when thediscussion came up before: we maybe could've had this in Python2.6. Where's that damn time machine when you *really* need it? ;-)


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Draft PEP: "Simplified Package Layout and Partitioning"

Reply via email to