Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Sat, Nov 26, 2011 at 11:53 AM, Éric Araujo mer...@netwok.org wrote: Le 11/08/2011 20:30, P.J. Eby a écrit : At 04:39 PM 8/11/2011 +0200, Éric Araujo wrote: I’ll just regret that it's not possible to provide a module docstring to inform that this is a namespace package used for X and Y. It *is* possible - you'd just have to put it in a zc.py file. IOW, this PEP still allows namespace-defining packages to exist, as was requested by early commenters on PEP 382. It just doesn't *require* them to exist in order for the namespace contents to be importable. That’s quite cool. I guess such a namespace-defining module (zc.py here) would be importable, right? Yes. Also, would it cause worse performance for other zc.* packages than if there were no zc.py? No. The first import of a subpackage sets up the __path__, and all subsequent imports use it. A pure virtual package having no source file, I think it should have no __file__ at all. Antoine and someone else thought likewise (I can find the link if you want); do you consider it consensus enough to update the PEP? Sure. At this point, though, before doing any more work on the PEP I'd like to have some idea of whether there's any chance of it being accepted. At this point, there seems to be a lot of passive, Usenet nod syndrome type support for it, but little active support. It doesn't help at all that I'm not really in a position to provide an implementation, and the persons most likely to implement have been leaning somewhat towards 382, or wanting to modify 402 such that it uses .pyp directory extensions so that PEP 395 can be supported... And while 402 is an extension of an idea that Guido proposed a few years ago, he hasn't weighed in lately on whether he still likes that idea, let alone whether he likes where I've taken it. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
Hi, Thanks for the replies. At this point, though, before doing any more work on the PEP I'd like to have some idea of whether there's any chance of it being accepted. At this point, there seems to be a lot of passive, Usenet nod syndrome type support for it, but little active support. If this helps, I am +1, and I’m sure other devs will chime in. I think the feature is useful, and I prefer 402’s way to 382’s pyp directories. I do acknowledge that 402 poses problems to PEP 395 which 382 does not, and as I’m not in a position to help, my vote may count less. Cheers ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
If this helps, I am +1, and I’m sure other devs will chime in. I think the feature is useful, and I prefer 402’s way to 382’s pyp directories. If that's the obstacle to adopting PEP 382, it would be easy to revert the PEP back to having file markers to indicate package-ness. I insist on having markers of some kind, though (IIUC, this is also what PEP 395 requires). The main problem with file markers is that a) they must not overlap across portions of a package, and b) the actual file name and content is irrelevant. a) means that package authors have to come up with some name, and b) means that the name actually doesn't matter (but the file name extension would). UUIDs would work, as would the name of the portion/distribution. I think the specific choice of name will confuse people into interpreting things in the file name that aren't really intended. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Thu, Dec 1, 2011 at 1:28 AM, PJ Eby p...@telecommunity.com wrote: It doesn't help at all that I'm not really in a position to provide an implementation, and the persons most likely to implement have been leaning somewhat towards 382, or wanting to modify 402 such that it uses .pyp directory extensions so that PEP 395 can be supported... While I was initially a fan of the possibilities of PEP 402, I eventually decided that we would be trading an easy problem (you need an '__init__.py' marker file or a '.pyp' extension to get Python to recognise your package directory) for a hard one (What's your sys.path look like? What did you mean for it to look like?). Symlinks (and the fact we implicitly call realname() during system initialisation and import) just make things even messier. *Deliberately* allowing package structures on the filesystem to become ambiguous is a recipe for future pain (and could potentially undo a lot of the good work done by PEP 328's elimination of implicit relative imports). I acknowledge there is a lot of confusion amongst novices as to how packages and imports actually work, but my diagnosis of the root cause of that problem is completely different from that supposed by PEP 402 (as documented in the more recent versions of PEP 395, I've come to believe it is due to the way we stuff up the default sys.path[0] initialisation when packages are involved). So, in the end, I've come to strongly prefer the PEP 382 approach. The principle of Explicit is better than implicit applies to package detection on the filesystem just as much as it does to any other kind of API design, and it really isn't that different from the way we treat actual Python files (i.e. you can *execute* arbitrary files, but they need to have an appropriate extension if you want to import them). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Nov 30, 2011, at 6:39 PM, Nick Coghlan wrote: On Thu, Dec 1, 2011 at 1:28 AM, PJ Eby p...@telecommunity.com wrote: It doesn't help at all that I'm not really in a position to provide an implementation, and the persons most likely to implement have been leaning somewhat towards 382, or wanting to modify 402 such that it uses .pyp directory extensions so that PEP 395 can be supported... While I was initially a fan of the possibilities of PEP 402, I eventually decided that we would be trading an easy problem (you need an '__init__.py' marker file or a '.pyp' extension to get Python to recognise your package directory) for a hard one (What's your sys.path look like? What did you mean for it to look like?). Symlinks (and the fact we implicitly call realname() during system initialisation and import) just make things even messier. *Deliberately* allowing package structures on the filesystem to become ambiguous is a recipe for future pain (and could potentially undo a lot of the good work done by PEP 328's elimination of implicit relative imports). I acknowledge there is a lot of confusion amongst novices as to how packages and imports actually work, but my diagnosis of the root cause of that problem is completely different from that supposed by PEP 402 (as documented in the more recent versions of PEP 395, I've come to believe it is due to the way we stuff up the default sys.path[0] initialisation when packages are involved). So, in the end, I've come to strongly prefer the PEP 382 approach. The principle of Explicit is better than implicit applies to package detection on the filesystem just as much as it does to any other kind of API design, and it really isn't that different from the way we treat actual Python files (i.e. you can *execute* arbitrary files, but they need to have an appropriate extension if you want to import them). I've helped an almost distressing number of newbies overcome their confusion about sys.path and packages. Systems using Twisted are, almost by definition, hairy integration problems, and are frequently being created or maintained by people with little to no previous Python experience. Given that experience, I completely agree with everything you've written above (except for the part where you initially liked it). I appreciate the insight that PEP 402 offers about python's package mechanism (and the difficulties introduced by namespace packages). Its statement of the problem is good, but in my opinion its solution points in exactly the wrong direction: packages need to be _more_ explicit about their package-ness and tools need to be stricter about how they're laid out. It would be great if sys.path[0] were actually correct when running a script inside a package, or at least issued a warning which would explain how to correctly lay out said package. I would love to see a loud alarm every time a module accidentally got imported by the same name twice. I wish I knew, once and for all, whether it was 'import Image' or 'from PIL import Image'. My hope is that if Python starts to tighten these things up a bit, or at least communicate better about best practices, editors and IDEs will develop better automatic discovery features and frameworks will start to normalize their sys.path setups and stop depending on accidents of current directory and script location. This will in turn vastly decrease confusion among new python developers taking on large projects with a bunch of libraries, who mostly don't care what the rules for where files are supposed to go are, and just want to put them somewhere that works. -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
Hi, Going through my email backlog. Le 11/08/2011 20:30, P.J. Eby a écrit : At 04:39 PM 8/11/2011 +0200, Éric Araujo wrote: (By the way, both of these additions to the import protocol (i.e. the dynamically-added ``__path__``, and dynamically-created modules) apply recursively to child packages, using the parent package's ``__path__`` in place of ``sys.path`` as a basis for generating a child ``__path__``. This means that self-contained and virtual packages can contain each other without limitation, with the caveat that if you put a virtual package inside a self-contained one, it's gonna have a really short ``__path__``!) I don't understand the caveat or its implications. Since each package's __path__ is the same length or shorter than its parent's by default, then if you put a virtual package inside a self-contained one, it will be functionally speaking no different than a self-contained one, in that it will have only one path entry. So, it's not really useful to put a virtual package inside a self-contained one, even though you can do it. (Apart form it letting you avoid a superfluous __init__ module, assuming it's indeed superfluous.) I still don’t understand why this matters or what negative effects it could have on code, but I’m fine with not understanding. I’ll trust that people writing or maintaining import-related tools will agree or complain about that item. I’ll just regret that it's not possible to provide a module docstring to inform that this is a namespace package used for X and Y. It *is* possible - you'd just have to put it in a zc.py file. IOW, this PEP still allows namespace-defining packages to exist, as was requested by early commenters on PEP 382. It just doesn't *require* them to exist in order for the namespace contents to be importable. That’s quite cool. I guess such a namespace-defining module (zc.py here) would be importable, right? Also, would it cause worse performance for other zc.* packages than if there were no zc.py? This was probably said on import-sig, but here I go: yet another import artifact in the sys module! I hope we get ImportEngine in 3.3 to clean up all this. Well, I rather *like* having them there, personally, vs. having to learn yet another API, but oh well, whatever. Agreed with “whatever” :) I just like to grunt sometimes. AFAIK, ImportEngine isn't going to do away with the need for the global ones to live somewhere, Yep, but as Nick replied, at least we’ll gain one structure to rule them all. Let's imagine my application Spam has a namespace spam.ext for plugins. To use a custom directory where plugins are stored, or a zip file with plugins (I don't use eggs, so let me talk about zip files here), I'd have to call sys.path.append *and* pkgutil.extend_virtual_paths? As written in the current proposal, yes. There was some discussion on Python-Dev about having this happen automatically, and I proposed that it could be done by making virtual packages' __path__ attributes an iterable proxy object, rather than a list: That sounds a bit too complicated. What about just having pkgutil.extend_virtual_paths call sys.path.append? For maximum flexibility, extend_virtual_paths could have an argument to avoid calling sys.path.append. Besides, putting data files in a Python package is held very poorly by some (mostly people following the File Hierarchy Standard), ISTM that anybody who thinks that is being inconsistent in considering the Python code itself to not be a data file by that same criterion... especially since one of the more common uses for such data files are for e.g. HTML templates (which usually contain some sort of code) or GUI resources (which are pretty tightly bound to the code). A good example is documentation: Having a unique location (/usr/share/doc) for all installed software makes my life easier. Another example is JavaScript files used with HTML documents, such as jQuery: Debian recently split the jQuery file out of their Sphinx package, so that there is only one library installed that all packages can use and that can be updated and fixed once for all. (I’m simplifying; there can be multiple versions of libraries, but not multiple copies. I’ll stop here; I’m not one of the authors of the Filesystem Hierarchy Standard, and I’ll rant against package_data in distutils mailing lists :) A pure virtual package having no source file, I think it should have no __file__ at all. Antoine and someone else thought likewise (I can find the link if you want); do you consider it consensus enough to update the PEP? Regards ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
Éric Araujo merwok at netwok.org writes: Besides, putting data files in a Python package is held very poorly by some (mostly people following the File Hierarchy Standard), and in distutils2/packaging, we (will) have a resources system that’s as convenient for users and more flexible for OS packagers. Using __file__ for more than information on the module is frowned upon for other reasons anyway (I talked about a Debian developer about this one day but forgot), so I think the limitation is okay. The FHS does not apply in all scenarios - not all Python code is deployed/packaged at system level. For example, plug-ins (such as Django apps) are often not meant to be installed by a system-level packager. This might also be true in scenarios where Python is embedded into some other application. It's really useful to be able to co-locate packages with their data (e.g. in a zip file) and I don't think all instances of putting data files in a package are to be frowned upon. Regards, Vinay Sajip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
At 02:02 PM 8/11/2011 -0400, Glyph Lefkowitz wrote: Rather than a one-by-one ad-hoc consideration of which attribute should be set to None or empty strings or string or what have you, I'd really like to see a discussion in the PEP saying what a package really is vs. what a module is, and what one can reasonably expect from it from an API and tooling perspective. The assumption I've been working from is the only guarantee I've ever seen the Python docs give: i.e., that a package is a module object with a __path__ attribute. Modules aren't even required to have a __file__ object -- builtin modules don't, for example. (And the contents of __file__ are not required to have any particular semantics: PEP 302 notes that it can be a dummy value like frozen, for example.) Technically, btw, PEP 302 requires __file__ to be a string, so making __file__ = None will be a backwards-incompatible change. But any code that walks modules in sys.modules is going to break today if it expects a __file__ attribute to exist, because 'sys' itself doesn't have one! So, my leaning is towards leaving off __file__, since today's code already has to deal with it being nonexistent, if it's working with arbitrary modules, and that'll produce breakage sooner rather than later -- the twisted.python.modules code, for example, would fail with a loud AttributeError, rather than going on to silently assume that a module with a dummy __file__ isn't a package. (Which is NOT a valid assumption *now*, btw, as I'll explain below.) Anyway, if you have any suggestions for verbiage that should be added to the PEP to clarify these assumptions, I'd be happy to add them. However, I think that the real problem you're encountering at the moment has more to do with making assumptions about the Python import ecosystem that aren't valid today, and haven't been valid since at least the introduction of PEP 302, if not earlier import hook systems as well. But the whole pure virtual mechanism here seems to pile even more inconsistency on top of an already irritatingly inconsistent import mechanism. I was reasonably happy with my attempt to paper over PEP 302's weirdnesses from a user perspective: http://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.htmlhttp://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.html (or https://launchpad.net/moduleshttps://launchpad.net/modules if you are not a Twisted user) Users of this API can traverse the module hierarchy with certain expectations; each module or package would have .pathEntry and .filePath attributes, each of which would refer to the appropriate place. Of course __path__ complicates things a bit, but so it goes. I don't mean to be critical, and no doubt what you've written works fine for your current requirements, but on my quick attempt to skim through the code I found many things which appear to me to be incompatible with PEP 302. That is, the above code hardocdes a variety of assumptions about the import system that haven't been true since Python 2.3. (For example, it assumes that the contents of sys.path strings have inspectable semantics, that the contents of __file__ can tell you things about the module-ness or package-ness of a module object, etc.) If you want to fully support PEP 302, you might want to consider making this a wrapper over the corresponding pkgutil APIs (available since Python 2.5) that do roughly the same things, but which delegate all path string inspection to importer objects and allow extensible delegation for importers that don't support the optional methods involved. (Of course, if the pkgutil APIs are missing something you need, perhaps you could propose additions.) Now it seems like pure virtual packages are going to introduce a new type of special case into the hierarchy which have neither .pathEntry nor .filePath objects. The problem is that your API's notion that these things exist as coherent concepts was never really a valid assumption in the first place. .pth files and namespace packages already meant that the idea of a package coming from a single path entry made no sense. And namespace packages installed by setuptools' system packaging mode *don't have a __file__ attribute* today... heck they don't have __init__ modules, either. So, adding virtual packages isn't actually going to change anything, except perhaps by making these scenarios more common. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Aug 12, 2011, at 11:24 AM, P.J. Eby wrote: That is, the above code hardocdes a variety of assumptions about the import system that haven't been true since Python 2.3. Thanks for this feedback. I honestly did not realize how old and creaky this code had gotten. It was originally developed for Python 2.4 and it certainly shows its age. Practically speaking, the code is correct for the bundled importers, and paths and zipfiles are all we've cared about thus far. (For example, it assumes that the contents of sys.path strings have inspectable semantics, that the contents of __file__ can tell you things about the module-ness or package-ness of a module object, etc.) Unfortunately, the primary goal of this code is to do something impossible - walk the module hierarchy without importing any code. So some heuristics are necessary. Upon further reflection, PEP 402 _will_ make dealing with namespace packages from this code considerably easier: we won't need to do AST analysis to look for a __path__ attribute or anything gross like that improve correctness; we can just look in various directories on sys.path and accurately predict what __path__ will be synthesized to be. However, the isPackage() method can and should be looking at the module if it's already loaded, and not always guessing based on paths. The whole reason there's an 'importPackages' flag to walk() is that some applications of this code care more about accuracy than others, so it tries to be as correct as it can be. (Of course this is still wrong for the case where a __path__ is dynamically constructed by user code, but there's only so well one can do at that.) If you want to fully support PEP 302, you might want to consider making this a wrapper over the corresponding pkgutil APIs (available since Python 2.5) that do roughly the same things, but which delegate all path string inspection to importer objects and allow extensible delegation for importers that don't support the optional methods involved. This code still needs to support Python 2.4, but I will make a note of this for future reference. (Of course, if the pkgutil APIs are missing something you need, perhaps you could propose additions.) Now it seems like pure virtual packages are going to introduce a new type of special case into the hierarchy which have neither .pathEntry nor .filePath objects. The problem is that your API's notion that these things exist as coherent concepts was never really a valid assumption in the first place. .pth files and namespace packages already meant that the idea of a package coming from a single path entry made no sense. And namespace packages installed by setuptools' system packaging mode *don't have a __file__ attribute* today... heck they don't have __init__ modules, either. The fact that getModule('sys') breaks is reason enough to re-visit some of these design decisions. So, adding virtual packages isn't actually going to change anything, except perhaps by making these scenarios more common. In that case, I guess it's a good thing; these bugs should be dealt with. Thanks for pointing them out. My opinion of PEP 402 has been completely reversed - although I'd still like to see a section about the module system from a library/tools author point of view rather than a time-traveling perl user's narrative :). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote: Upon further reflection, PEP 402 _will_ make dealing with namespace packages from this code considerably easier: we won't need to do AST analysis to look for a __path__ attribute or anything gross like that improve correctness; we can just look in various directories on sys.path and accurately predict what __path__ will be synthesized to be. The flip side of that is that you can't always know whether a directory is a virtual package without deep inspection: one consequence of PEP 402 is that any directory that contains a Python module (of whatever type), however deeply nested, will be a valid package name. So, you can't rule out that a given directory *might* be a package, without walking its entire reachable subtree. (Within the subset of directory names that are valid Python identifiers, of course.) However, you *can* quickly tell that a directory *might* be a package or is *probably* one: if it contains modules, or is the same name as an already-discovered module, it's a pretty safe bet that you can flag it as such. In any case, you probably should *not* do the building of a virtual path yourself; the protocols and APIs added by PEP 402 should allow you to simply ask for the path to be constructed on your behalf. Otherwise, you are going to be back in the same business of second-guessing arbitrary importer backends again! (E.g. note that PEP 402 does not say virtual package subpaths must be filesystem or zipfile subdirectories of their parents - an importer could just as easily allow you to treat subdirectories named 'twisted.python' as part of a virtual package with that name!) Anyway, pkgutil defines some extra methods that importers can implement to support module-walking, and part of the PEP 402 implementation should be to make this support virtual packages as well. This code still needs to support Python 2.4, but I will make a note of this for future reference. A suggestion: just take the pkgutil code and bundle it for Python 2.4 as something._pkgutil. There's very little about it that's 2.5+ specific, at least when I wrote the bits that do the module walking. Of course, the main disadvantage of pkgutil for your purposes is that it currently requires packages to be imported in order to walk their child modules. (IIRC, it does *not*, however, require them to be imported in order to discover their existence.) In that case, I guess it's a good thing; these bugs should be dealt with. Thanks for pointing them out. My opinion of PEP 402 has been completely reversed - although I'd still like to see a section about the module system from a library/tools author point of view rather than a time-traveling perl user's narrative :). LOL. If you will propose the wording you'd like to see, I'll be happy to check it for any current-and-or-future incorrect assumptions. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Aug 12, 2011, at 2:33 PM, P.J. Eby wrote: At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote: Upon further reflection, PEP 402 _will_ make dealing with namespace packages from this code considerably easier: we won't need to do AST analysis to look for a __path__ attribute or anything gross like that improve correctness; we can just look in various directories on sys.path and accurately predict what __path__ will be synthesized to be. The flip side of that is that you can't always know whether a directory is a virtual package without deep inspection: one consequence of PEP 402 is that any directory that contains a Python module (of whatever type), however deeply nested, will be a valid package name. So, you can't rule out that a given directory *might* be a package, without walking its entire reachable subtree. (Within the subset of directory names that are valid Python identifiers, of course.) Are there any rules about passing invalid identifiers to __import__ though, or is that just less likely? :) However, you *can* quickly tell that a directory *might* be a package or is *probably* one: if it contains modules, or is the same name as an already-discovered module, it's a pretty safe bet that you can flag it as such. I still like the idea of a 'marker' file. It would be great if there were a new marker like __package__.py. I say this more for the benefit of users looking at a directory on their filesystem and trying to understand whether this is a package or not than I do for my own programmatic tools though; it's already hard enough to understand the package-ness of a part of your filesystem and its interactions with PYTHONPATH; making directories mysteriously and automatically become packages depending on context will worsen that situation, I think. I also have this not-terribly-well-defined idea that it would be handy for different providers of the _contents_ of namespace packages to provide their own instrumentation to be made aware that they've been added to the __path__ of a particular package. This may be a solution in search of a problem, but I imagine that each __package__.py would be executed in the same module namespace. This would allow namespace packages to do things like set up compatibility aliases, lazy imports, plugin registrations, etc, as they currently do with __init__.py. Perhaps it would be better to define its relationship to the package-module namespace in a more sensible way than execute all over each other in no particular order. Also, if I had my druthers, Python would raise an exception if someone added a directory marked as a package to sys.path, to refuse to import things from it, and when a submodule was run as a script, add the nearest directory not marked as a package to sys.path, rather than the script's directory itself. The whole __name__ is wrong because your current directory was wrong when you ran that command thing is so confusing to explain that I hope we can eventually consign it to the dustbin of history. But if you can't even reasonably guess whether a directory is supposed to be an entry on sys.path or a package, that's going to be really hard to do. In any case, you probably should *not* do the building of a virtual path yourself; the protocols and APIs added by PEP 402 should allow you to simply ask for the path to be constructed on your behalf. Otherwise, you are going to be back in the same business of second-guessing arbitrary importer backends again! What do you mean building of a virtual path? (E.g. note that PEP 402 does not say virtual package subpaths must be filesystem or zipfile subdirectories of their parents - an importer could just as easily allow you to treat subdirectories named 'twisted.python' as part of a virtual package with that name!) Anyway, pkgutil defines some extra methods that importers can implement to support module-walking, and part of the PEP 402 implementation should be to make this support virtual packages as well. The more that this can focus on module-walking without executing code, the happier I'll be :). This code still needs to support Python 2.4, but I will make a note of this for future reference. A suggestion: just take the pkgutil code and bundle it for Python 2.4 as something._pkgutil. There's very little about it that's 2.5+ specific, at least when I wrote the bits that do the module walking. Of course, the main disadvantage of pkgutil for your purposes is that it currently requires packages to be imported in order to walk their child modules. (IIRC, it does *not*, however, require them to be imported in order to discover their existence.) One of the stipulations of this code is that it might give different results when the modules are loaded and not. So it's fine to inspect that first and then invoke pkgutil only in the 'loaded' case, with the knowledge that the not-loaded case may be
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
At 05:03 PM 8/12/2011 -0400, Glyph Lefkowitz wrote: Are there any rules about passing invalid identifiers to __import__ though, or is that just less likely? :) I suppose you have a point there. ;-) I still like the idea of a 'marker' file. It would be great if there were a new marker like __package__.py. Having any required marker file makes separately-installable portions of a package impossible, since it would then be in conflict at installation time. The (semi-)competing proposal, PEP 382, is based on allowing each portion to have a differently-named marker; we came up with PEP 402 as a way to get rid of the need for any marker files (not to mention the bikeshedding involved.) What do you mean building of a virtual path? Constructing the __path__-to-be of a not-yet-imported virtual package. The PEP defines a protocol for constructing this, by asking the importer objects to provide __path__ entries, and it does not require anything to be imported. So there's no reason to re-implement the algorithm yourself. The more that this can focus on module-walking without executing code, the happier I'll be :). Virtual packages actually improve on this situation, in that a virtual path can be computed without the need to import the package. (Assuming a submodule or subpackage doesn't munge the __path__, of course.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Aug 11, 2011, at 04:39 PM, Éric Araujo wrote: * XXX what is the __file__ of a pure virtual package? ``None``? Some arbitrary string? The path of the first directory with a trailing separator? No matter what we put, *some* code is going to break, but the last choice might allow some code to accidentally work. Is that good or bad? A pure virtual package having no source file, I think it should have no __file__ at all. I don’t know if that would break more code than using an empty string for example, but it feels righter. I agree that the empty string is the worst of the choices. no __file__ or __file__=None is better. -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Aug 11, 2011, at 11:39 AM, Barry Warsaw wrote: On Aug 11, 2011, at 04:39 PM, Éric Araujo wrote: * XXX what is the __file__ of a pure virtual package? ``None``? Some arbitrary string? The path of the first directory with a trailing separator? No matter what we put, *some* code is going to break, but the last choice might allow some code to accidentally work. Is that good or bad? A pure virtual package having no source file, I think it should have no __file__ at all. I don’t know if that would break more code than using an empty string for example, but it feels righter. I agree that the empty string is the worst of the choices. no __file__ or __file__=None is better. In some sense, I agree: hacks like empty strings are likely to lead to path-manipulation bugs where the wrong file gets opened (or worse, deleted, with predictable deleterious effects). But the whole pure virtual mechanism here seems to pile even more inconsistency on top of an already irritatingly inconsistent import mechanism. I was reasonably happy with my attempt to paper over PEP 302's weirdnesses from a user perspective: http://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.html (or https://launchpad.net/modules if you are not a Twisted user) Users of this API can traverse the module hierarchy with certain expectations; each module or package would have .pathEntry and .filePath attributes, each of which would refer to the appropriate place. Of course __path__ complicates things a bit, but so it goes. Now it seems like pure virtual packages are going to introduce a new type of special case into the hierarchy which have neither .pathEntry nor .filePath objects. Rather than a one-by-one ad-hoc consideration of which attribute should be set to None or empty strings or string or what have you, I'd really like to see a discussion in the PEP saying what a package really is vs. what a module is, and what one can reasonably expect from it from an API and tooling perspective. Right now I have to puzzle out the intent of the final API from the problem/solution description and thought experiment. Despite authoring several namespace packages myself, I don't have any of the problems described in the PEP. I just want to know how to write correct tools given this new specification. I suspect that this PEP will be the only reference for how packages work for a long time coming (just as PEP 302 was before it) so it should really get this right.___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Thu, 11 Aug 2011 11:39:52 -0400 Barry Warsaw ba...@python.org wrote: On Aug 11, 2011, at 04:39 PM, Éric Araujo wrote: * XXX what is the __file__ of a pure virtual package? ``None``? Some arbitrary string? The path of the first directory with a trailing separator? No matter what we put, *some* code is going to break, but the last choice might allow some code to accidentally work. Is that good or bad? A pure virtual package having no source file, I think it should have no __file__ at all. I don’t know if that would break more code than using an empty string for example, but it feels righter. I agree that the empty string is the worst of the choices. no __file__ or __file__=None is better. None should be the answer. It simplifies inspection of module data (repr(__file__) gives you something recognizable instead of raising) and makes semantically sense (!) since there is, indeed, no actual file backing the module. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
At 04:39 PM 8/11/2011 +0200, Ãric Araujo wrote: Hi, I've read PEP 402 and would like to offer comments. Thanks. Minor: I would reserve packaging for packaging/distribution/installation/deployment matters, not Python modules. I suggest Python package semantics. Changing to Python package import semantics to hopefully be even clearer. ;-) (Nitpick: I was somewhat intentionally ambiguous because we are talking here about how a package is physically implemented in the filesystem, and that actually *is* kind of a packaging issue. But it's not necessarily a *useful* intentional ambiguity, so I've no problem with removing it.) Minor: In the UNIX world, or with version control tools, moving and renaming are the same one thing (hg mv spam.py spam/__init__.py for example). Also, if you turn a module into a package, you may want to move code around, change imports, etc., so I'm not sure the renaming part is such a big step. Anyway, if the import-sig people say that users think it's a complex or costly operation, I can believe it. It's not that it's complex or costly in anything other than *mental* overhead -- you have to remember to do it and it's not particularly obvious. (But people on import-sig did mention this and other things covered by the PEP as being a frequent root cause of beginner inquiries on #python, Stackoverflow, et al.) (By the way, both of these additions to the import protocol (i.e. the dynamically-added ``__path__``, and dynamically-created modules) apply recursively to child packages, using the parent package's ``__path__`` in place of ``sys.path`` as a basis for generating a child ``__path__``. This means that self-contained and virtual packages can contain each other without limitation, with the caveat that if you put a virtual package inside a self-contained one, it's gonna have a really short ``__path__``!) I don't understand the caveat or its implications. Since each package's __path__ is the same length or shorter than its parent's by default, then if you put a virtual package inside a self-contained one, it will be functionally speaking no different than a self-contained one, in that it will have only one path entry. So, it's not really useful to put a virtual package inside a self-contained one, even though you can do it. (Apart form it letting you avoid a superfluous __init__ module, assuming it's indeed superfluous.) In other words, we don't allow pure virtual packages to be imported directly, only modules and self-contained packages. (This is an acceptable limitation, because there is no *functional* value to importing such a package by itself. After all, the module object will have no *contents* until you import at least one of its subpackages or submodules!) Once ``zc.buildout`` has been successfully imported, though, there *will* be a ``zc`` module in ``sys.modules``, and trying to import it will of course succeed. We are only preventing an *initial* import from succeeding, in order to prevent false-positive import successes when clashing subdirectories are present on ``sys.path``. I find that limitation acceptable. After all, there is no zc project, and no zc module, just a zc namespace. I'll just regret that it's not possible to provide a module docstring to inform that this is a namespace package used for X and Y. It *is* possible - you'd just have to put it in a zc.py file. IOW, this PEP still allows namespace-defining packages to exist, as was requested by early commenters on PEP 382. It just doesn't *require* them to exist in order for the namespace contents to be importable. The resulting list (whether empty or not) is then stored in a ``sys.virtual_package_paths`` dictionary, keyed by module name. This was probably said on import-sig, but here I go: yet another import artifact in the sys module! I hope we get ImportEngine in 3.3 to clean up all this. Well, I rather *like* having them there, personally, vs. having to learn yet another API, but oh well, whatever. AFAIK, ImportEngine isn't going to do away with the need for the global ones to live somewhere, at least not in 3.3. * A new ``extend_virtual_paths(path_entry)`` function, to extend existing, already-imported virtual packages' ``__path__`` attributes to include any portions found in a new ``sys.path`` entry. This function should be called by applications extending ``sys.path`` at runtime, e.g. when adding a plugin directory or an egg to the path. Let's imagine my application Spam has a namespace spam.ext for plugins. To use a custom directory where plugins are stored, or a zip file with plugins (I don't use eggs, so let me talk about zip files here), I'd have to call sys.path.append *and* pkgutil.extend_virtual_paths? As written in the current proposal, yes. There was some discussion on Python-Dev about having this happen automatically, and I proposed that it could be done by making virtual
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Fri, Aug 12, 2011 at 4:30 AM, P.J. Eby p...@telecommunity.com wrote: At 04:39 PM 8/11/2011 +0200, Éric Araujo wrote: The resulting list (whether empty or not) is then stored in a ``sys.virtual_package_paths`` dictionary, keyed by module name. This was probably said on import-sig, but here I go: yet another import artifact in the sys module! I hope we get ImportEngine in 3.3 to clean up all this. Well, I rather *like* having them there, personally, vs. having to learn yet another API, but oh well, whatever. AFAIK, ImportEngine isn't going to do away with the need for the global ones to live somewhere, at least not in 3.3. And likely not for the entire 3.x series - I shudder at the thought of the backwards incompatibility hell associated with trying to remove them... The point of the ImportEngine API is that the caching elements of the import state introduce cross dependencies between various global data structures. Code that manipulates those data structures needs to correctly invalidate or otherwise update the state as things change. I seem to recall a certain programming construct that is designed to make it easier to manage interdependent data structures... Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com