Re: [Catalog-sig] How to determine if archive is an sdist or bdist
On Sun, Mar 31, 2013 at 6:13 PM, James Carpenter nawk...@gmail.com wrote: Do you have a module/function/line number in easy_install I should use? I'm sure I can dig it out myself but it sounds like you might just be able to put your finger on it in only a minute or two. It's the install_eggs() method of setuptools.commands.easy_install.easy_install. You won't really be able to use it, it just looks for a setup.py after *unpacking* the archive. It also doesn't look for a PKG-INFO; PyPI does that. (And I only know that because it was relevant to the uploadability of eggs at one time.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] How to determine if archive is an sdist or bdist
On Fri, Mar 29, 2013 at 11:00 AM, James Carpenter nawk...@gmail.com wrote: Looks like the idea of using a custom command is a better approach then. I'm not sure why you think that. The only kinds of archives whose file types are ambiguous from the name, are sdist, bdist_dumb, and random raw source dumps. Everything else has a unique extension like .egg, .exe, .msi, rpm, etc. If you have a .zip, .tar.gz, .tgz, or some other archive name, you can find out if it's an sdist by inspecting its contents as I described. And if it's not an sdist, you can usually tell if it's a raw source dump by checking for a setup.py in the archive root or a depth-1 subdirectory off the root. (That's what easy_install does, anyway, when it's given an archive it doesn't know what to do with.) Is a custom command my only choice or can I register pre/post hooks to any given command? On Thu, Mar 28, 2013 at 3:36 PM, PJ Eby p...@telecommunity.com wrote: On Thu, Mar 28, 2013 at 3:57 PM, James Carpenter nawk...@gmail.com wrote: Is there an easy way to programmatically tell if an archive (tar.gz, zip, etc.) in the dist directory is a binary or sdist? I would like to post-process the contents of a dist directory and classify each build artifact there (egg, sdist, bdist, etc.). An sdist always has a single subdirectory in the archive's root directory, named for the package+version, and containing a PKG-INFO and setup.py (plus a bunch of other stuff). A bdist_dumb will not have such a subdirectory in the archive root; instead it will have one or more directories like /usr, /opt, /Program Files. Other bdist formats? Hard to say. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Merge catalog-sig and distutils-sig
On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake f...@fdrake.net wrote: On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft don...@stufft.io wrote: Is there much point in keeping catalog-sig and distutils-sig separate? No. The last time this was brought up, there were objections, but I don't remember what they were. I'll let people who think there's a point worry about that. Not sure if there's some official process for requesting it or not, but I think we should merge the two lists and just make packaging-sig to umbrella the entire packaging topics. There is the meta-sig, but the description is out-dated: http://mail.python.org/mailman/listinfo/meta-sig and the last message in the archives is dated 2011, and sparked no discussion: http://mail.python.org/pipermail/meta-sig/2011-June.txt +1 on merging the lists. Can we do it by just dropping catalog-sig and keeping distutils-sig? I'm afraid we might lose some important distutils-sig population if the process involves renaming the list, resubscribing, etc. I also *really* don't want to invalidate archive links to the distutils-sig archive. All in all, +1 on not having two lists, but I'm really worried about breaking distutils-sig. We're still going to be talking about distribution utilities, after all. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Merge catalog-sig and distutils-sig
On Thu, Mar 28, 2013 at 3:43 PM, Donald Stufft don...@stufft.io wrote: On Mar 28, 2013, at 3:39 PM, PJ Eby p...@telecommunity.com wrote: Can we do it by just dropping catalog-sig and keeping distutils-sig? I'm afraid we might lose some important distutils-sig population if the process involves renaming the list, resubscribing, etc. I also *really* don't want to invalidate archive links to the distutils-sig archive. All in all, +1 on not having two lists, but I'm really worried about breaking distutils-sig. We're still going to be talking about distribution utilities, after all. Worst case I'm sure subscribers can be transferred and the existing archive kept intact. That's a great way to have a bunch of people complaining that they never subscribed to packaging-sig, not to mention the part where it breaks everyone's mail filters. I really don't see any gains for renaming the list. It's not like we can go and scrub the entire internet of references to distutils-sig. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] How to determine if archive is an sdist or bdist
On Thu, Mar 28, 2013 at 3:57 PM, James Carpenter nawk...@gmail.com wrote: Is there an easy way to programmatically tell if an archive (tar.gz, zip, etc.) in the dist directory is a binary or sdist? I would like to post-process the contents of a dist directory and classify each build artifact there (egg, sdist, bdist, etc.). An sdist always has a single subdirectory in the archive's root directory, named for the package+version, and containing a PKG-INFO and setup.py (plus a bunch of other stuff). A bdist_dumb will not have such a subdirectory in the archive root; instead it will have one or more directories like /usr, /opt, /Program Files. Other bdist formats? Hard to say. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Merge catalog-sig and distutils-sig
On Thu, Mar 28, 2013 at 5:15 PM, Jacob Kaplan-Moss ja...@jacobian.org wrote: C'mon, folks, we're arguing about a name. That's about as close to literal bikeshedding as we could get. I'm not arguing about the *name*. I just don't see the point in making everybody subscribe to a new list and change their mail filters (and update every book and webpage out there that mentions the distutils-sig), because a few people want to *change* the name -- a change that AFAICT doesn't actually provide any tangible benefit to anybody whatsoever. How about we just let whoever has the keys make the change in whatever way's easiest and most logical for them? Because it's not up to just the person with the keys. Neither SIG is a mere mailing list, it's a Python special interest group, and SIGs have their own formation and termination processes. In particular, if you're going to start a new SIG, one of the requirements to be met is in particular, no other SIG nor the general Python newsgroup is already more suitable (per the Python SIG Creation Guidelines). It's hard to argue that distutils-sig isn't already more suitable than whatever is being proposed to take its place. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Access to Windows' cert store
On Thu, Mar 21, 2013 at 8:06 AM, Christian Heimes christ...@python.org wrote: Hi, the message is slightly off-topic but it might be interesting for pip, setuptools and other developers that are working on HTTPS for PyPI. I while ago I found C++ example code that shows how to dump CA and CRL certs from Windows's system cert store. The system cert store contains the certificates used by Windows, IE etc. Yesterday I reimplemented the C++ code with Python and ctypes. I have tested it with Python 2.6 to 3.3 (x86 and x86_64) on Windows 7. It should work with Windows XP / Windows Server 2003 and all newer versions of Windows. The output is usabl by Python's SSL module but you have to dump the certs to a file first. I'm planing to add the feature to Python 3.4, too. http://bugs.python.org/issue17134 You can download the code from https://bitbucket.org/tiran/wincertstore Very nice! I definitely would like to use this for setuptools, but I actually want it for versions 2.3-2.5, which can't use requests or urllib3 or anything like that. So I hacked on the code a bit and got it to work (or at least got the __main__ stub to spit out a bunch of data) with Python 2.3 and ctypes 1.0.2 (the last standalone release for which Windows binaries are available). Would you like a patch? (Note: absolute_import, decorators, and the actual use of with: and generator expressions had to go, but this doesn't change any API or semantics as far as I can tell, just a bit of appearance here and there, and the code still runs with 2.4, 2.5, 2.7, 3.1, and 3.2 that I tried.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On Sat, Mar 16, 2013 at 3:15 AM, Nick Coghlan ncogh...@gmail.com wrote: On 15 Mar 2013 16:16, Carl Meyer c...@oddbird.net wrote: tl;dr: I see your points, we'll change the PEP to allow clients to use hostnames instead of the rel attributes if they prefer. I will veto any such change. Clients MUST NOT assume that the architecture of the index service will be limited to a single host name, they must process the explicit metadata provided by the index that indicates which hosts the index controls. Adding a --trust-indices flag to make this optional in setuptools would be fine, but it seems perverse to trust every aspect of an index *except* its claims to control additional hosts. Actually, setuptools trusts redirects, so that mechanism is available for splitting the hosted files to another domain. As it stands, though, I don't see a way to support this without introducing confusion. The advantage of using allow-hosts based on the index host is that it *also* specifies what to do with dependency links provided by individual packages; the PEP does not provide any real guidance on this point. So, I have to withdraw my support for the PEP with these recent changes, as it no longer reflects the approach I previously agreed to, and as yet there have been no alternatives proposed to address the user confusion issues (which IMO at least are a big part of the point of having the PEP). Of course, if redirection is required for non-extrapolatable hostnames, or if somebody comes up with a new and brilliant scheme to manage the menage of permissions needed across dependency_links, the index, and general host trusting issues (while remaining comprehensible and predictable to end users), I'll certainly have a look again. But I took the weekend off from this discussion to try to come up with one myself, and so far I've got nothing. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On Mon, Mar 18, 2013 at 1:22 PM, PJ Eby p...@telecommunity.com wrote: Actually, setuptools trusts redirects, so that mechanism is available for splitting the hosted files to another domain. As it stands, though, I don't see a way to support this without introducing confusion. Oops - that wasn't clear. By this I meant the current version of the PEP. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
Do we even need the internal/external rel info? I was planning to just use the URL hostname. i.e., are there any use cases for designating an externally-hosted file internal, or an internally-hosted file external? If not, it seems the rel= is redundant. It's also more work to implement, vs. just defaulting --allow-hosts to be the --index-url host; a strategy ISTM pip could also use, since it has the same two options available. Also, if we're not doing homepage/download crawling any more, I was hoping we could just drop the code that 'parses' rel= links in the first place, as it's an awkward ugly hack. ;-) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On Fri, Mar 15, 2013 at 12:07 PM, Carl Meyer c...@oddbird.net wrote: On 03/15/2013 09:15 AM, PJ Eby wrote: Do we even need the internal/external rel info? I was planning to just use the URL hostname. i.e., are there any use cases for designating an externally-hosted file internal, or an internally-hosted file external? If not, it seems the rel= is redundant. Right; Donald and Holger already gave the rationale for this: there are good reasons for an index to not have internal links actually on the exact same hostname. Even just using a different subdomain would break simple host comparison. It's also more work to implement, vs. just defaulting --allow-hosts to be the --index-url host; a strategy ISTM pip could also use, since it has the same two options available. Pip actually doesn't currently have --allow-hosts, although there's no good reason for that; it ought to. Also, if we're not doing homepage/download crawling any more, I was hoping we could just drop the code that 'parses' rel= links in the first place, as it's an awkward ugly hack. ;-) Well, parsing HTML links as an API is an ugly hack, but within that existing framework rel seems like the appropriate semantic attribute for this type of information, not really upping the hackiness quotient :-) Well, to be clear, I liked previous versions of the proposal better than this one. But while I *really* don't want to do any new rel parsing, that's not the only or even the most important reason. The main reason is that I think internal vs. external is a bogus distinction: what's important (IMO) is what hosts you do and don't trust. Giving a blanket pass to all external links doesn't seem like such a good idea to me, nor does allowing the index to define what hosts the client should trust. As for the internal ones, I'm not sure why we can't at least make a subdomain requirement, or have users explicitly add a PyPI CDN to their configured --allow-hosts. To try to put it another way: there should be one, and preferably only one, obvious way to specify where you get downloads from. That way in easy_install is currently --allow-hosts. Adding new options that interact and overlap with that looks like bad UI design to me, increasing the possibility of user confusion. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On Fri, Mar 15, 2013 at 1:39 PM, Carl Meyer c...@oddbird.net wrote: up to you whether you also want to use rel=internal as a hint for implicitly (perhaps with warning) adding to --allow-hosts, That's the bit I don't like. The security model is that if it's not allowed by allowed-hosts, it's *not allowed*. Introducing a way to sneak something past allow-hosts is a bad idea, because it means people either have to explicitly widen their allow-hosts to arbitrary hosts, or else that you can't actually enforce an allowed-hosts policy, or that you need to learn a whole bunch of options to implement it. ISTM that this is a bad design choice for users, and I'm not comfortable with this without some way to define the allowed internal hosts based in some way on the base index URL. Not just for ease of automated translation, but so that *users* can know who they're dealing with, and easily predict the effects of their chosen options. A frequent refrain has been, users don't know they're downloading stuff from places other than PyPI, so if this new approach allows downloads from somewhere other than *.pypi.python.org when you've chosen pypi.python.org as your index, ISTM the proposal is failing to meet its original goals. As the PEP is written, PyPI could change out to a different CDN each week or use different ones for different files, and users would be back in the position of not being sure where stuff is coming from. I'm fine with extending the default host matching to indexhost,*.indexhost if we want to leave more of an option for PyPI and other indexes to use a CDN. But I'm not sure how much point to it there is, since a /simple index is static, and small in size compared to the downloads, so you might as well host a copy of the /simple index alongside the downloads, and make the index pypicdn.com/simple or whatever in the first place. (In other words, not a lot of benefit to splitting a static index from its associated files, so why support it?) PyPI wouldn't be enforcing a UI on you here, just providing metadata that you can use as you wish. That's not what the PEP says. It does in fact *mandate* the use of the rel attributes. So if somebody adds an external link that actually points back to PyPI, technically I'm not supposed to use it unless it's been explicitly authorized. ;-) I'd really prefer to see explicit language that says the rel information is advisory only and that installers aren't required to parse it, let alone use it. At the moment, the PEP is a substantial departure from the version I agreed with. (If there were to be any meaningful distinction in the links themselves, I would think it'd more be whether, e.g. hash information is available for the download. That's a potentially relevant distinction right now, in that PyPI automatically provides #md5 info. Even so, I'm not sure that's enough of a distinction for anyone to care about.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On Fri, Mar 15, 2013 at 7:16 PM, Carl Meyer c...@oddbird.net wrote: Ok, pending agreement from Holger I'll make a change in the PEP to explicitly allow clients to make decisions based on either the rel attributes or based on hostnames. Would that be sufficient to address your concerns? Yes. I just don't want to be in a situation down the road where there's another argument about this on Catalog-SIG when PyPI starts using a CDN that, but it says this in the rel and you're supposed to use that, and I say, but Carl and Holger said... and they go, doesn't matter, PEP says ;-) This way, the PEP will be clear that supporting a split of PyPI's hostnames isn't in current scope. I am also okay with the PEP allowing *.indexhost instead of just indexhost as the filtering mechanism, as long as it specifies one *now*. (Again, so this doesn't have to be revisited later.) If somebody who knows something about CDNs, TUF, etc., needs to weigh in on it first, that's fine. I just want to know where things stand. Putting the /simple/ API on a CDN isn't quite that easy because it currently involves some server-side redirects to effectively make project names case-insensitive. FWIW, easy_install works fine without this. If a matching index page isn't found, it checks the full package list. PyPI's redirection just reduces bandwidth usage and request overhead in the case where the case of the user's request doesn't match the actual package listing. But it could be completely static without affecting easy_install and tools that use its package-finding code. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm
On Thu, Mar 14, 2013 at 6:07 AM, M.-A. Lemburg m...@egenix.com wrote: On 12.03.2013 22:26, PJ Eby wrote: On Tue, Mar 12, 2013 at 3:59 PM, M.-A. Lemburg m...@egenix.com wrote: On 12.03.2013 19:15, M.-A. Lemburg wrote: I've run into a weird issue with easy_install, that I'm trying to solve: If I place two files named egenix_mxodbc_connect_client-2.0.2-py2.6.egg egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip into the same directory and let easy_install running on Linux scan this, it considers the second file for Windows as best match. Is the algorithm used for determining the best match documented somewhere ? I've had a look at the implementation, but this left me rather clueless. I thought that setuptools would prefer the .egg file over the prebuilt .zip file - binary files being easier to install than source files. After some experiments, I found that the follow change in filename (swapping platform and python version, in addition to use '-' instead of '.) works: egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip OTOH, this one doesn't (notice the difference ?): egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip The logic behind all this looks rather fragile to me. easy_install only guarantees sane version parsing for distribution files built using setuptools' naming algorithms. If you use distutils, it can only make guesses, because the distutils does not have a completely unambiguous file naming scheme. And if you are naming the files by hand, God help you. ;-) The problem appears to be a bug in setuptools' package_index.py. The function interpret_distro_name() creates a set of possible separations of the found name into project name and version. It does find the right separation, but for some reason, the code using that function does not check the found project names against the project name the user is trying to install, but simply takes the last entry of the list returned by the above function. As a result, easy_install downloads and tries to install project files that don't match the project name in some cases. Here's another example where it fails (say you're on a x64 Linux box): # easy_install egenix-pyopenssl As example, say it finds these distribution files: 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-linux-x86_64-prebuilt.zip', 'egenix_pyopenssl-0.13.1.1.0.1.5-py2.7-linux-x86_64.egg', 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-macosx-10.5-x86_64-prebuilt.zip', 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs4-macosx-10.5-x86_64-prebuilt.zip', It then creates different interpretations of those names, puts them in a list and sorts them. Here's the end of that list: egenix-pyopenssl; 0.13.1.1.0.1.5 -- this would be the correct .egg file egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-linux-x86-64-prebuilt egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-macosx-10.5-x86-64-prebuilt egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs2-macosx; 10.5-x86-64-prebuilt egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx; 10.5-x86-64-prebuilt It picks the last entry, which would be for a project called egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx - not the one the user searched. Actually, that's not quite true. It's picking: egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt Because it thinks that '0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt' is a higher version than 0.13.1.1.0.1.5. It does also record the possibility you mentioned, but it doesn't pick that one. The project names actually *do* have to match. If you open a ticket on the setuptools tracker, 'll try to see if I can get it to recognize that strings like py2.7, macosx, ucs, and the like are terminators for a version number. I don't know how successful I'll be, though. Basically, those zip files are (I assume) bdist_dumb distributions being taken for source distributions, and easy_install doesn't actually support bdist_dumb files at the moment. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm
On Thu, Mar 14, 2013 at 2:11 PM, M.-A. Lemburg m...@egenix.com wrote: Is there any way to have 0.13.1.1.0.1.5-something sort before 0.13.1.1.0.1.5 ? (e.g. like is done for release candidates) Make it 0.13.1.1.0.1.5-devsomething, and it'll have lower precedence than both 0.13.1.1.0.1.5 and 0.13.1.1.0.1.5-something. If you could point me to that tracker, I'll open a ticket :-) http://bugs.python.org/setuptools/ ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files
On Wed, Mar 13, 2013 at 7:21 AM, holger krekel hol...@merlinux.eu wrote: Hi all, after some more discussions and hours spend by Carl Meyer (who is now co-authoring the PEP) and me, here is a new V3 pre-submit draft. It is now more ambitious than the previous draft as should be obvious from the modified abstract (and Carl Meyers and Philip's earlier interactions on this list). There also are more details of how the current link-scraping works among other improvements and incorporations of feedback from discussions here. We intend to submit this draft tonight to the PEP editors. Feedback now and later remains welcome. I am sure there are issues to be sorted and clarified, among them the versioning-API suggestion by Marc-Andre. Thanks for everybody's support and feedback so far, holger Looks good to me! Setuptools' two releases will probably look like this: 1. Default to externals index, warn when fetching URLs that are not the same host as the index 2. Default to externals index, reject URLs that are not the same host as the index unless --allow-hosts is configured (IOW, default allow-hosts to equal index-url host) That way, external URLs can still be discovered by the user, but the default configuration is still secure. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] A 90% Solution
On Tue, Mar 12, 2013 at 5:50 AM, M.-A. Lemburg m...@egenix.com wrote: Not hard to do: we'd just need to keep the old index in place using a different URL, e.g. /simple-v1/. That's not necessary: the XML-RPC API lets you query those URLs directly. They're part of the metadata standard, after all... which means you can *also* access them by downloading the DOAP records, browsing the PyPI pages directly, etc. There are plenty of ways to get that data, no point adding another one. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Tue, Mar 12, 2013 at 1:25 AM, Lennart Regebro rege...@gmail.com wrote: Externally hosted files are a real world actual problem. You're leaving out some important words from that sentence. Words like, for some people and who choose to depend on projects using them. PyPI isn't your private personal playground. Other people have rights, too. This discussion has since a long time gone past reason into pure stop energy. I agree - hardly anyone is giving any reasoning that justifies why one group of people should have their projects censored to benefit a few blowhards on Catalog-SIG. Carl's the only person who's even *tried* giving a justification. Everyone else just shuts up or changes the subject when I ask that question. I'll ask it again: why should *thousands* of projects be censored or made to change their release processes, because *you* can't be bothered to cache the distributions of the projects you depend on? Not, why would it be a good idea for them to change anyway. Why should they be *forced* to do it? Bonus points: answer why, *every time* somebody proposes a way of improving things that doesn't *ban* external hosting, you guys go all stop energy on that and derail the discussion with why it has to be total. AFAICT, you're the ones stopping things moving forward here, filibustering against every possible compromise. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI
On Tue, Mar 12, 2013 at 7:38 AM, holger krekel hol...@merlinux.eu wrote: In addition, maintainers of installation tools are asked to release two updates. The first one shall provide clear warnings if external crawling needs to happen, A clarification here: needs to happen is not well-specified. An installer tasked with finding the latest or best-matching version of a package must currently *always* crawl. So the warning would be always. The strategy I originally chose for making this change in easy_install is to warn once at the beginning that --allow-hosts has not been set, and thus packages might be downloaded from anywhere on the internet. I've since become uncertain that this change is actually workable in the short term, since until most of the packages are actually moved onto PyPI, a lot of installs will fail if somebody changes their configuration to be more secure. So I'm thinking the warning needs to be deferred until at least the more popular packages have moved to PyPI. Now, if there is some agreement, i can submit this PEP officially tomorrow, and given agreement/refinments from the Pycon folks and the likes of Richard, we may be able to get going very shortly after Pycon. I'd like to suggest that the PEP should be explicit that no other changes to the /simple generation algorithm are being made, just the removal or alteration of rel= attributes. i.e., it will still be possible -- at least in the near term -- for projects to include explicit download links to files made available elsewhere. Changing that situation is more controversial and will require wider community participation than has occurred to date. It might also be good to suggest that authors of PyPI clones plan their own phase-out of rel= attributes. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Tue, Mar 12, 2013 at 12:29 PM, Jacob Kaplan-Moss ja...@jacobian.org wrote: On Tue, Mar 12, 2013 at 11:19 AM, M.-A. Lemburg m...@egenix.com wrote: So let's do this carefully and find a good solution before jumping to conclusions. Completely agreed; rushing is a bad idea. But so is not starting. What I'm seeing — as a total outsider, a user of these tools, not someone who creates them — is that a bunch of people (Holger, Donald, Richard, the pip maintainers, etc.) have the beginnings of a solution ready to go *right now*, and I want to capture that energy and enthusiasm before it evaporates. This isn't an academic situation; I've seen companies decline to adopt Python over this exact security issue. Nobody told them about how to configure a restricted, site-wide default --allow-hosts setting? ( http://peak.telecommunity.com/DevCenter/EasyInstall#restricting-downloads-with-allow-hosts and http://docs.python.org/2/install/index.html#location-and-names-of-config-files ) (FWIW, --allow-hosts was added in setuptools 0.6a6 -- *years* before the distribute fork or the existence of pip, and pip offers the same option.) I've already agreed to change setuptools to default this option to only allow downloads from the same host as its index URL, in a future release. (i.e. to default --allow-hosts to the host of the --index-url option), and I support the removing of rel= spidering from PyPI (which will significantly mitigate the immediate speed and security issues). Heck, I've been the one who'se repeatedly proposed various ways of cutting back or removing rel= attributes from the /simple index. The result of these two changes will actually have the same net effect that people are being asking for here: you'll only be able to download stuff hosted on PyPI, unless you explicitly override the --allow-hosts to get a wider range of packages. Already today, when a URL is blocked by --allow-hosts, it's announced as part of easy_install's output, so you can see exactly how much wider you need to extend your trust for the download to succeed. The *only* thing I object to is removing the ability for people to *choose* their own levels of trust. And I have not yet seen an argument that justifies removing people's ability to *choose* to be more inclusive in their downloads. And I've put multiple compromise proposals out there to begin mitigating the problem *now* (i.e. for non-updated versions of setuptools), and every time, the objection is, no, we need to ban it all now, no discussion, no re-evaluation, no personal choice, everyone must do as we say, no argument. And I don't understand that, at all. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Tue, Mar 12, 2013 at 1:33 PM, Jesse Noller jnol...@gmail.com wrote: There's not much to understand: external hosting of packages is *actively harmful*, period. End users of easy_install and pip *don't even realize* 99% of the time that these tools are following links off of PyPi and installing packages from random, probably insecure/non https locations all over the internet. Once they realize it they recoil in terror if they have any understanding of the implications. This is a rationale for secure defaults for various options, like the ones I outlined in the portions of my post that you *didn't* quote. It's not a rationale for removing the options themselves. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Tue, Mar 12, 2013 at 2:18 PM, Carl Meyer c...@oddbird.net wrote: It seems to me that there's a remarkable level of consensus developing here (though it may not look like it), and a small set of remaining open questions. The consensus (as I see it): - Migrate away from scraping external HTML pages, with package owners in control of the migration but a deadline for a forced switch, as outlined in Holger's PEP (with all appropriate caution and testing). - In some way, migrate to a situation where the popular installer tools install only release files from PyPI by default, but are capable of installing from other locations if the user provides an option. Perhaps I'm confused, but ISTM that every time I've said this, Donald and Lennart argue that it should not be possible to provide such an option -- or to be more specific, that PyPI should not publish the information that makes that option possible. If that's *not* the position they're taking, it'd be good to know, because we could totally stop arguing about it in that case. A) Leave external links in the PyPI simple index, but migrate the major tools to not use external links by default (i.e. Philip's plan to make allow-hosts=pypi the default in a future setuptools), with an option to turn them back on. I don't know who has proposed this option, but it's not me. You seem to be confusing external links and HTML-scraped links (rel= attributed links in /simple). I was the first person to propose disabling HTML-scraped links from PyPI *ASAP*. I still want them gone. That won't require tool changes, it just requires a rollout plan. Holger has one, let's work on that. The second thing I proposed is that new tools be developed to *assist* package authors in moving their files onto PyPI, so that future tool changes wouldn't result in widespread instances of people needing to set their tools to insecure settings just to get anything done. We need to get people's files moving onto PyPI *first*, in order to make changing the tool defaults practical. The *only* thing I object to is the part where some people want to ban external links from /simple, always and forever, regardless of the package authors' choice in the matter. B) Do a second PyPI migration, again with a per-package toggle and package owners in control, to a no external links in simple index setting. Consider for a moment how similar the end state here is with either A or B. In either case, by default users install only from PyPI, but by providing a special option they can install from some external source. (In B, that special option would be something like --find-links with a URL). In either case, we can continue to allow packages to register themselves on PyPI, be found in searches, etc, without uploading release files to PyPI if they prefer not to; they'll just have to provide special installation instructions to their users in that case. Not true: approach B means that you won't know what values to pass to the option. It's also confused about an important point. All the links that appear in /simple are *already* completely under the package author's control. No new switches are required to remove external links - you can simply remove them from your releases' descriptions. This process could be made more transparent or easy, sure -- but it's a mistake to say that this is granting the package owners control that they don't already have. What they lack control over is the rel= attributes, short of removing those links entirely. That's why I've proposed having a switch for that , as reflected in Holger's pre-PEP. 1) With B, we can provide a gentler migration for package owners, where they are in control of when the switch happens. 2) With B, all end users benefit from the new defaults, not only end users who update to the latest and greatest tools. 3) With B (and probably some forms of A as well), end users clearly state which external sources they would like to trust and install from, rather than having a global trust everything! flag, which is less secure and less sensible. These 3 statements all mischaracterize things substantially, because none of those benefits are exclusive to A, and nobody has proposed a trust everything flag. Removing rel= attributes also benefits everyone right away, *without* new tools. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Tue, Mar 12, 2013 at 2:43 PM, Robert Collins robe...@robertcollins.net wrote: This takes an age when each new web host to talk to is a new DNS lookup (say 0.3 seconds) + HTTP request (0.6 seconds) with possible HTTPS setup in there too (up to 1.2 seconds). A project with dozens of dependencies in it's transitive dependency graph may take minutes *just spidering*. Which is why we should act on Holger's pre-PEP to drop the rel= attributes from projects that don't actually use them -- builds involving those projects will immediately drop to one HTTP request to PyPI, plus one to whatever host has the actually needed file. And that's without any tooling changes whatsoever: builds all over the planet will just get faster and more secure, right away. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI
On Tue, Mar 12, 2013 at 2:07 PM, M.-A. Lemburg m...@egenix.com wrote: Just a quick note (more later, if time permits)... On 12.03.2013 18:05, holger krekel wrote: Hi Marc-Andre, all, - Prepare PYPI implementation to allow a per-project hosting mode, effectively enabling or disabling external crawling. When enabled nothing changes from the current situation of producing ``rel=download`` and ``rel=homepage`` attributed links on ``simple/`` pages, causing installers to crawl those sites. When disabled, the attributions of links will change to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to avoid crawling 3rd party sites. Retaining the meta-information allows tools to still make use of the semantic information. Please start using versioned APIs for these things. The old style index should still be available under some URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/ Not sure it is neccessary in this case. I would think it makes the implementation harder and it would probably break PEP381 (mirroring infrastructure) as well. Here's what I meant: We publish the current implementation of the /simple/ index API under a new URL /simple-v1/, so that people that want to use the old API can continue to do so. Do you know of anyone who's *actually* going to need/use this alternate API. Why can't they just the XML-RPC API, the DOAP API, or any other means of obtaining this information? Heck, the proposal to just change the value of the rel attribute isn't going to stop anybody from using that data. Please let's not complicate this by adding more API formats for PyPI to support.. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Tue, Mar 12, 2013 at 3:36 PM, Jacob Kaplan-Moss ja...@jacobian.org wrote: On Tue, Mar 12, 2013 at 2:21 PM, PJ Eby p...@telecommunity.com wrote: The *only* thing I object to is the part where some people want to ban external links from /simple, always and forever, regardless of the package authors' choice in the matter. Here's the thing though, there are already a bunch of other ways users can install packages from external repositories. I can think of at least two: * I can pip/easy_install a given URL (e.g. easy_install https://www.djangoproject.com/download/1.5/tarball/) * I can use a custom index server (pip install -i http://localserver/ django) The important part is that in each of those cases I can see clearly where I'm getting things from. From where I stand the absolutely non-negotiable part is that `pip/easy_install/whatever package` should NEVER access an external host (after some suitable transition period). This needs to include older installer software, and it needs to make it hard for new tools to do the wrong thing. How this is achieved really doesn't matter to me -- if there's a pip install --insecure Django that's fine too -- but to me it's non-negotiable that the out-of-the-box configuration not allow external hosts. I'm confused by this statement. never access an external host is not consistent with have the option to specify what hosts you trust, while still keeping PyPI as a universal index of Python software. Yes, this means taking some options away from the package creator. It means that when I'm wearing my author-of-Django hat I can't choose to list Django on PyPI but provide the download elsewhere. That's not perfect, but given a creator choice vs out of the box security choice the latter has to win. [And as a package creator I still have options: I can run my own package server, fairly easy to do these days.] Again, the *how* isn't a big deal to me, but the result is really important: the tooling has to be secure-by-default, and that means (among other things) `pip install package` can never hit something that's not PyPI without me explicitly asking for it. That part's fine. As I've said repeatedly, though, it's the removing other links from the /simple index entirely that's the problem. Under what I've proposed, as soon as the tools are updated to secure-default (and the situation *now* if you set your --allow-hosts to PyPI-only), is that easy_install will announce what URLs it is skipping because they're not on PyPI. (pip too, IIUC.) I can't tell you how to configure pip for this, but if you want to configure easy_install to be secure right *now*, add: [easy_install] allow_hosts=pypi.python.org to your user-level or site-wide distutils .cfg file. Better yet, encourage other people to add it now, find out what they can no longer install, and talk to their upstream providers about moving to PyPI. This is all good. I'm just saying, we don't need to change PyPI to do anything but drop the rel= links, and change the tools to default allow-hosts to equal index-url. (pip has the same parameters, not sure what config files it uses, though. I don't think it inherits [easy_install] settings, though.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Tue, Mar 12, 2013 at 4:14 PM, Carl Meyer c...@oddbird.net wrote: You say below that nobody has proposed a 'trust everything' flag. If there is no trust everything flag, then it seems to me that with either option A or option B the user needs to specify what they intend to trust. I.e. if you make the default value of allow-hosts the index url host, as you said you plan to do at some point, users would need to override it with the hosts they want to allow. It seems like maybe what you are wanting is automatically-discoverable installation from externally-hosted files? I.e. that I could say easy_install Foo --allow-external, without needing to know any specific external url for Foo? This is what I was characterizing as a trust everything flag, but on reflection I don't think I have any problem with that. Here's a story to illustrate what I mean: Joe wants to install foo. He runs easy_install Foo. Foo is hosted externally to PyPI, so easy_install says: URL foo.com/downloads/foo-1.2.tgz BLOCKED by allow-hosts option -- install failed. (Or words to that effect; I'd have to check the source to get you the exact phrasing). The point is, Joe now *knows where to get foo from*, because PyPI still had the information. Joe can now decide whether he wants to download it manually and inspect it first, expand his allow-hosts option, or give Foo a pass. The proposals that call for banning all links from the /simple index, prevent Joe from being able to do this at all. This is partly true. An explicit flag grants package owners more control in that right now they don't have a choice about whether external links to tarballs in their long_description automatically get sucked into the simple index. This is not hypothetical; even if there were no rel-link scraping, I've had cases where package owners have complained to me about pip installing an RC tarball they had linked directly from their long-description, not intending it to be auto-installable. Fair enough. Thank you for actually providing an illustration of a problem. There's been far too much handwaving of problems without any explicit description of what the problem *is*. I would support making references to external links explicit rather than implicit. I think it would be preferable if in the future package owners wouldn't need to be careful what release-file links they might place in their long_description, and release files would be only explicitly nominated. Ok. I think the current automatically suck in links to simple/ behavior is only useful as a backwards-compatibility hack, which is why I think an explicit switch to disable it (on by default for newly-registered projects, slowly, gently, carefully migrated to on for existing projects) is better than keeping this link-scraping behavior indefinitely for all projects and asking package owners to clean up their long-descriptions. I would agree with dropping link parsing from the description field, provided that an alternative way is provided for projects to explicitly add external links to /simple, concurrent with the other changes. Thank you for taking the time to engage and re-engage on this issue, and to Explain It Like I'm Five for me, with an illustration of an actual problematic use case. ;-) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm
On Tue, Mar 12, 2013 at 3:59 PM, M.-A. Lemburg m...@egenix.com wrote: On 12.03.2013 19:15, M.-A. Lemburg wrote: I've run into a weird issue with easy_install, that I'm trying to solve: If I place two files named egenix_mxodbc_connect_client-2.0.2-py2.6.egg egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip into the same directory and let easy_install running on Linux scan this, it considers the second file for Windows as best match. Is the algorithm used for determining the best match documented somewhere ? I've had a look at the implementation, but this left me rather clueless. I thought that setuptools would prefer the .egg file over the prebuilt .zip file - binary files being easier to install than source files. After some experiments, I found that the follow change in filename (swapping platform and python version, in addition to use '-' instead of '.) works: egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip OTOH, this one doesn't (notice the difference ?): egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip The logic behind all this looks rather fragile to me. easy_install only guarantees sane version parsing for distribution files built using setuptools' naming algorithms. If you use distutils, it can only make guesses, because the distutils does not have a completely unambiguous file naming scheme. And if you are naming the files by hand, God help you. ;-) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Sun, Mar 10, 2013 at 8:25 PM, Donald Stufft don...@stufft.io wrote: I don't think anyone is bad here, nor am I arguing against any particular person or group of people. I'm arguing against a practice and a system. You're going out of your way to find excuses to throw all sorts of stop energy here. Calling a legitimate disagreement with your point of view stop energy seems inappropriate to me, since my issue is with you derailing the topic of how to get people to *voluntarily* migrate to a better situation than the present one, and to develop tools for that process. The only thing I wish you to stop is the repeated assertion without proof that 1) external links must go *and* 2) this must be an enforced directive rather than a (highly-encouraged) option. I have even gone so far as to suggest, earlier in this thread, what evidence I would find at least suggestive of your POV. But your response to that and prior challenges to those assertions, has been simply to move your goalpost. E.g. from current uptime is bad to any uptime lower than PyPI's is totally unacceptable. I, on the other hand, have moved in the direction of *your* proposals repeatedly, making adjustments as I find actually-convincing evidence and/or reasoning, or find ways to deal with the issues. I have compromised quite a bit. (And have already spent a fair amount of time writing setuptools code to lay a foundation for these changes.) You, as far as I can tell, have not moved your position in the slightest. Which of these is stop energy? It is not the case that external links must be removed from PyPI in order to ensure security, or uptime. And it is *especially* not the case that you are the BDFL of uptime. You're definitely not the BDFL of uptime for any given project hosted on PyPI, that you *voluntarily choose* to make a part of your build process. If your primary argument is that project X must host its files on PyPI because of your build process, then I think you misunderstand open source, and also the part where you *chose* to make it part of your build process. It certainly doesn't give you the right to force projects Y, Z, and Q -- that you don't even use! -- to also host their projects on PyPI, because project X -- the one you do use -- has a slow or unreliable file host! It seems disingenuous to then shfit the argument back to security when challenged on uptime, and back to uptime when challenged on security. We've looped back and forth over those for some time: when I point out that wheels have signatures which will make off-site hosting relatively unimportant to the security picture, you jump back to talking about uptime. When I point out that uptime is a consensual factor that in no way justifies legislating what other people can do with their projects, you go back to talking about security. Make up your mind. What problem are you actually trying to solve? (I expect your response on wheels to be that wheels aren't there yet, etc., but that isn't actually a response to the objection unless you're going to change your position to, okay, external links to file formats that can be signed can stay, or something of that sort. Otherwise, you're not actually compromising, just using the fact that wheels aren't in common use yet as an argument to keep the position you started with.) My analogy served only to put into light that the system that I'm trying to change is insecure, just like allowing anyone to walk into a bank vault and pick up money would be insecure. I fully believe that the people using such a system are completely trustworthy people. But just because *they* are trustworthy doesn't mean that a system which allows *anyone* to attack other Python developers is *ok*. And my analogy served only to put into light the part where you're insisting that one group of people change for the benefit of a group which is already benefiting from their pre-existing generosity. That being said, I do see that I could have misinterpreted the intent of your analogy -- it sounded like you were saying that the developers who host off-PyPI were thieves walking into your bank and taking your money (i.e., analogizing theft with inconveniencing you by making your builds fail or run slowly). Though to be honest, I still don't comprehend how else to make any kind of sense to that analogy in its original context. Who is the bank? Whose money is being taken? The whole thing is utterly confusing to me if I try to take it any other way than the way I did, because it doesn't seem to have any other simple 1:1 mapping to the situation, as far as I can see. Your explanation seems terribly abstract and tortured to me, as far as analogies go. When discussing security of a system it's necessary to divorce yourself from the implementations of things. When you get wrapped up in the implementation you turn things into a Us vs Them game (as evidenced by several of your messages) instead of discussing the
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Mon, Mar 11, 2013 at 7:14 AM, Donald Stufft don...@stufft.io wrote: 1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy? That any of those things apply to anybody who *isn't using those packages*. Without this, you are only providing a reason to encourage people to change, not to force them to do so. 2) Even a single project remaining causes the entire thing to cascade Cascade *how*? Please explain. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Mon, Mar 11, 2013 at 12:45 PM, Lennart Regebro rege...@gmail.com wrote: On Mon, Mar 11, 2013 at 5:12 PM, PJ Eby p...@telecommunity.com wrote: On Mon, Mar 11, 2013 at 7:14 AM, Donald Stufft don...@stufft.io wrote: 1) Proof of what? That it's insecure? That it harms uptime? That it violates people's privacy? That any of those things apply to anybody who *isn't using those packages*. If nobody is using the packages, it does indeed harm no-one. Then there is no reason to ban them. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Mon, Mar 11, 2013 at 1:45 PM, Lennart Regebro rege...@gmail.com wrote: So, we should not remove the links for external packages until somebody traverses those links? But as soon as somebody asks for those links, we should remove them? In fact before we give them the link? I'm saying that if someone objects to the presence of links they don't actually use, they are speaking nonsense. Might as well ask to ban all packages from PyPI that they don't personally like -- it's the same request. Nobody is forcing you to depend on packages that don't host on PyPI, so there is no point to the censorship. If you don't use the links, you can't argue that their presence is causing you harm. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Mon, Mar 11, 2013 at 4:07 PM, Carl Meyer c...@oddbird.net wrote: On 03/11/2013 01:57 PM, PJ Eby wrote: I'm saying that if someone objects to the presence of links they don't actually use, they are speaking nonsense. Might as well ask to ban all packages from PyPI that they don't personally like -- it's the same request. Nobody is forcing you to depend on packages that don't host on PyPI, so there is no point to the censorship. If you don't use the links, you can't argue that their presence is causing you harm. You can, of course, argue that the mere presence of those links (combined with the current behavior of easy_install/pip) is an attractive nuisance that indirectly causes harm to unsuspecting new users of Python who never even consider the possibility that tools like easy_install and pip might spider off PyPI to arbitrary websites Which is why I think removing rel= spidering is a good idea. In fact, I'm the one who suggested that. I also suggested moving to turning it off by default in future versions of easy_install, adding warnings, etc. But that's not the same thing as agreeing that it should be *banned* for people to publish machine-readable download information on PyPI for a file that's hosted off-PyPI. ISTM that Python's consenting adults standard sets a higher bar for banning a feature than it does for marking it, here there be dragons and offering a better alternative. Heck, even in Python the language, the mere removal of a feature in a new version of Python, doesn't stop people from continuing to use the old one. Here we're talking about infrastructure that everybody uses; it's not like there's a PyPI X.1 that people can keep using if X.2 comes out. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
[Catalog-sig] A 90% Solution
Just a thought, but... If 90% of PyPI projects do not have any external files to download, then, wouldn't it make sense to: 1. Add a project-level option to enable or disable the adding of the rel= attribute to /simple links (but not affecting the links in any other way) 2. Default it to disabled for new projects, and 3. Set it to disabled *now* for the 90% of projects that *don't have external files*? If the arguments about banning external links are as valid and important as some people claim, wouldn't it make sense to do this part *now*, without first requiring a commitment to force the switch to a disabled state in the future? Immediately, 90% of the problem goes away - no random spidering of stuff that doesn't contain a link now, but which could be taken over by a malicious party in the future, and 90% fewer sites having to be up in order for you to build something from PyPI. Seems like a serious win to me -- and one that might not even need a PEP. Next steps after this would be providing tools to help people move their files and links, promoting that people switch it off if they no longer support the offsite links, educating about security concerns, etc. I really don't understand why the 90% solution isn't *already* the consensus position, since it doesn't preclude follow-on efforts towards reducing the 10% towards 0%. And if the problem is so important, why must we keep 90% of the problems in place, just so we can keep arguing about censoring the 10%? That doesn't make sense to me. To me, if somebody's injured, the first thing you do is clean and close the wound, not argue about whether it's a complete solution and what might happen days or weeks later. Just a thought. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] A 90% Solution
On Mon, Mar 11, 2013 at 7:39 PM, Donald Stufft don...@stufft.io wrote: On Mar 11, 2013, at 7:04 PM, PJ Eby p...@telecommunity.com wrote: Just a thought, but... If 90% of PyPI projects do not have any external files to download, then, wouldn't it make sense to: To be accurate it's 90% don't have any files/release available *only* externally. Most have external files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't. So what is the % of projects for whom the option can be disabled automatically, *without* disabling automated downloadability of a project's externally hosted files? Your statement is confusing to me, because the having of a home page or download URL doesn't have anything to do with whether that page has any files to download from it. I am saying that if a project has no *downloadable* files (not web pages) whose links can only be found by spidering, then we can turn off the rel attribute. How many projects do not have any download links listed on their rel=-linked pages? 1. Add a project-level option to enable or disable the adding of the rel= attribute to /simple links (but not affecting the links in any other way) 2. Default it to disabled for new projects, and 3. Set it to disabled *now* for the 90% of projects that *don't have external files*? +1 except 1. should be to remove the links entirely from the /simple/ index, not to just remove the rel attribute. -1, since sometimes download links are in fact *download links*. So this design choice would unncessarily limit the number of projects for whom the option could be applied automatically and immediately. That is, a project with a download link of foobar.com/foobar-1.2.tgz would no longer be usable if you removed the download link from the /simple index, but would remain usable if the rel attribute were removed. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] A 90% Solution
On Mon, Mar 11, 2013 at 8:28 PM, M.-A. Lemburg m...@egenix.com wrote: On 12.03.2013 00:39, Donald Stufft wrote: On Mar 11, 2013, at 7:04 PM, PJ Eby p...@telecommunity.com wrote: Just a thought, but... If 90% of PyPI projects do not have any external files to download, then, wouldn't it make sense to: To be accurate it's 90% don't have any files/release available *only* externally. Most have external files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't. How are you going to verify that disabling the links on those projects won't make certain release versions of those packages unavailable for pip/easy_install ? I'm not sure if you're asking Donald or me here. My proposal was to only automatically disable the rel attributes for links to pages that do *not* contain any easy_install or pip-able download links. So, by definition, this would not make any releases unavailable. As for what Donald is proposing, I honestly have no idea what he's talking about, or whether the 90% statistic actually applies for what I'm proposing. So it's possible that it might be a lot less than 90% that my proposal would be able to affect *instantly*, without contacting any authors. How are you planing to inform the package authors of that change, so that they can take corrective action ? Which options would be available for authors ? Do see my proposal again, which was simply that there be a switch to enable or disable the rel attributes, that it default off for new packages, and be switched to off for exactly that set of packages which would not result in the loss of access to any download files. There is, at this point, the question of how to handle projects that have some of their releases hosted externally, or with some of the files external and some not. I would prefer that any automated changeover apply only to packages where the set of discoverable links is exactly equal to the links found on the project's /simple page. Regarding the links, it's probably better to not remove the rel= attributes but instead change them from rel=download to e.g. rel=external-download; or to keep the old index semantics around as /simple-v1/. This keeps the valuable semantic relation available for tools that want to use it. For what? If you must keep them, rel=disabled-homepage etc. would get the message across. But I really don't see the point, and I *invented* the bloody things. Frankly, I'm more than prepared to toss the rel attributes altogether, after adequate notice is given for people to move their files or links to the files. I just don't want any changes in the *rest* of the /simple generation algorithm. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Sun, Mar 10, 2013 at 11:07 AM, holger krekel hol...@merlinux.eu wrote: Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig: scrutiny and feedback welcome. Hi Holger. I'm having some difficulty interpreting your proposal because it is leaving out some things, and in other places contradicting what I know of how the tools work. It is also a bit at odds with itself in some places. For instance, at the beginning, the PEP states its proposed solution is to host all release files on PyPI, but then the problem section describes the problems that arise from crawling external pages: problems that can be solved without actually hosting the files on PyPI. To me, it needs a clearer explanation of why the actual hosting part also needs to be on PyPI, not just the links. In the threads to date, people have argued about uptime, security, etc., and these points are not covered by the PEP or even really touched on for the most part. (Actually, thinking about that makes me wonder Donald: did your analysis collect any stats on *where* those externally hosted files were hosted? My intuition says that the bulk of the files (by *file count*) will come from a handful of highly-available domains, i.e. sourceforge, github, that sort of thing, with actual self-hosting being relatively rare *for the files themselves*, vs. a much wider range of domains for the homepage/download URLs (especially because those change from one release to the next.) If that's true, then most complaints about availability are being caused by crawling multiple not-highly-available HTML pages, *not* by the downloading of the actual files. If my intuition about the distribution is wrong, OTOH, it would provide a stronger argument for moving the files themselves to PyPI as well.) Digression aside, this is one of things that needs to be clearer so that there's a better explanation for package authors as to why they're being asked to change. And although the base argument is good (specifying the homepage will slow down the installation process), it could be amplified further with an example of some project that has had multiple homepages over its lifetime, listing all the URLs that currently must be crawled before an installer can be sure it has found all available versions, platforms, and formats of the that project. Okay, on to the Solution section. Again, your stated problem is to fix crawling, but the solution is all about file hosting. Regardless of which of these three hosting modes is selected, it remains an option for the developer to host files elsewhere, and provide the links in their description... unless of course you intended to rule that out and forgot to mention it. (Or, I suppose, if you did *not* intend to rule it out and intentionally omitted mention of that so the rabid anti-externalists would think you were on their side and not create further controversy... in which case I've now spoiled things. Darn. ;-) ) Some technical details are also either incorrect or confusing. For example, you state that The original homepage/download links are added as links without a ``rel`` attribute if they have the ``#egg`` format. But if they are added without a rel attribute, it doesn't *matter* whether they have an #egg marker or not. It is quite possible for a PyPI package to have a download_url of say, http://sourceforge.net/download/someproject-1.2.tgz;. Thus, I would suggest simply stating that changing hosting mode does not actually remove any links from the /simple index, it just removes the rel= attributes from the Home page and Download links, thus preventing them from being crawled in search of additional file links. With that out of the way, that brings me to the larger scope issue with the modes as presented. Notice now that with this clarification, there is no real difference in *state* between the pypi-cache and pypi-only modes. There is only a *functional* difference... and that function is underspecified in the PEP. What I mean is, in both pypi-cache and pypi-only, the *state* of things is that rel= attributes are gone, and there are links to files on PyPI. The only difference is in *how* the files get there. And for the pypi-cache mode, this function is *really* under-specified. Arguably, this is the meat of the proposal, but it is entirely missing. There is nothing here about the frequency of crawling, the methods used to select or validate files, whether there is any expiration... it is all just magically assumed to happen somehow. My suggestion would be to do two things: First, make the state a boolean: crawl external links, with the current state yes and the future state no, with no simply meaning that the rel= attribute is removed from the links that currently have it. Second, propose to offer tools in the PyPI interface (and command line) to assist authors in making the transition, rather than proposing a completely unspecified caching mechanism. Better to have some
Re: [Catalog-sig] Search engine relevance
On Sun, Mar 10, 2013 at 4:23 AM, Richard Jones r1chardj0...@gmail.com wrote: This might solve the AGI problem and could probably produce good results using the current ranking algorithm. Not sure. Google's search algorithms are far advanced ;-) Heh. This just gave me a bit of a chuckle, taken out of context. AGI, you see, is also an acronym for artificial general intelligence, so for a moment there I thought you were suggesting that using Postgres rankings properly could bring about the Singularity. ;-) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
On Sun, Mar 10, 2013 at 5:16 PM, Donald Stufft don...@stufft.io wrote: If someones release process forces PyPI to have security, uptime, and privacy issues then I'm very sorry but their release process is going to need to change. It's not fun, it's a shitty situation, but trying to bend over backwards to enable their current release processes is like trying to bend over backwards to enable people to still walk into their banks vault and grab a stack of currency. When people in group 1 express disapproval of people in group 2, this creates a rallying effect among members of group 1, and a *negative* counter-reaction in members of group 2. This is effective if, and *only* if, the people in group 2 have less power in the situation than the people in group 1. For example, if co-operation from the people in group 2 are not needed in order to carry out the wishes of group 1. However, in the situation under discussion, such co-operation is required, which means an alternative motivational strategy is indicated. That strategy involves giving persons in group 2 a better reason to care than because we in group 1 think you group 2 people are thieves. And by better, I mean, a reason that *benefits group 2*, and more specifically, each individual in group 2 who chooses to co-operate. And ideally, you work also to lower the cost of that co-operation. That's what *this* thread was originally about (lowering the cost of co-operation), before these burn the witch sentiments started up again. So, why not just step aside and let the adults go back to working on the actual problem? Just kidding, of course. ;-) That's an example of me using the same type of communication style, in the opposite direction: spewing disapproval at something I don't like, instead of giving you a reason that benefits *you*, to do what I want. See how it feels, going the other direction? Did it motivate you to be helpful? I'm guessing not. ;-) Anyway, my point is this: people don't like it one bit when you tell them what to do. If you tell them, you must do X, you get resistance. But if you offer them a choice, Are you going to do X or Y?, there's much less resistance. And if one choice is less convenient than the other, most will pick the easier choice. So, would you rather fight with developers to make them do it your way, or have most of them do exactly what you want and most of the rest get pretty close, but not have to fight with them about it? Right now, the impression you and certain other people are giving me is that it is more important that whatever action we take be seen as censuring the practice of off-PyPI hosting, than that we actually fix the problems! And it's difficult to take such a position seriously, because the post-hoc rationalization of harms is, well, unconvincing at best to a neutral party. When PyPI was first built, it didn't *have* hosting, so there was nothing morally wrong about off-site hosting then. And when hosting was first added, automated downloading didn't exist yet, either. So it still wasn't wrong. And when I added automated downloading, I made the choice to encourage people to collaborate by making it as easy as possible. So offsite hosting still wasn't wrong, in fact it was a documented alternative. And that's been the case for, oh, 8 years now? So what you're actually doing isn't crusading against evil-doers, it's more like saying that every restaurant that isn't McDonalds should be immediately remodeled, because you have just noticed the shocking trend that hardly any of those restaurants will serve you food as quickly! And that of course, the restaurant owners should undertake the remodeling and procedure changes, retraining, retooling, etc. at *their* expense, on *your* timeline. Just so that *you*, who *chose to visit those restaurants in the first place*, can get your food a bit more quickly. Sure, I know that's not how *you* see it. But surely you can see that's how the *restaurant owners* are going to see it. And if you want them to co-operate, it's probably going to be in your interest to focus your attention on their side of the equation, rather than on yours. You already agree with your point of view. They don't. I realize that can be difficult to do when you have strong feelings about a subject. For example, as I write this I keep backing up and deleting all sorts of unhelpful things I find myself wanting to say. ;-) And I'm doing that because I'm consciously reminding myself that *getting to a solution* is more important to me than *making you feel bad* for being wrong on the internet. What's more important to you? The *actual* state of PyPI, or the state of who is to be considered right or wrong? If it's the former, you would probably find it useful to your goals, to please refrain from calling me and that other 10% of PyPI thieves. Or really any other names whatsoever, explicitly OR implicitly. Thanks.
Re: [Catalog-sig] hash tags
On Fri, Mar 8, 2013 at 7:50 AM, M.-A. Lemburg m...@egenix.com wrote: After the feedback I got from Holger and Phillip, I'm currently writing a new version, which drops some of the unneeded requirements and spells out a few more things. Here's a very short version... Installers are modified: * to only follow rel=download links from the /simple/ index page, which have a hash tag (e.g. #md5=...) * will only use the fetched download page if its contents match the hash tag * scan that page for rel=download links, which again have to have a hash tag to be taken into account * only install files for which the hash tag matches the downloaded content This should provide a good way to make sure that the downloaded files are indeed under control of the package maintainer. There is, as I said before, a MUCH simpler way to do this, that works right now: put direct #md5 download links in your description, and phase out the rel= attributes altogether. The key to making this transition isn't creating elaborate new standards for the tools, it's *creating new tools for the standards*. Specifically, *migration tools*. A migration tool could be made that scans existing external links and converts found links to #md5 links or alternately uploads the files themselves to PyPI. You can do that without changing pip or distribute or anything else but PyPI, so there's no need to wait out update cycles to take advantage. Once a project/version has switched to either #md5 links or PyPI copies, you can just drop the rel= attributes and you're done. Alternately, if using the description for download links is considered a bad idea, add a new field to PyPI for them. Point is, this entire thing can be done correctly at the PyPI end and work with the existing API of the download tools. So far the only practical problem I've found with the approach is that the download page may not contain dynamic data, e.g. a date or timestamp, since that causes the hash tag not to verify. Which is completely unnecessary if one simply exposes the *actual* download links directly on PyPI. The download page is redundant, in a couple different ways. First, since it can't change, there's no point in re-fetching it all the time. Second, since it's only going to be read by tools anyway, there's no point to it containing anything besides the link. So, since the page only contains links, might as well put the links straight on PyPI, or at most have an option/tool to load the links from an external source. Again, the key to making this work is going to be somebody putting buttons in the PyPI interface (and making setuptools/distutils commands or similar CLI tools) to migrate their files (or links to the files) to PyPI hosting. A new API for such tools is entirely unnecessary -- at most there might need to be a new field made available/accessible. (Personally I don't care if your download links have to be in the description field if you're hosting off-site, but that's just me.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecation of External Urls, Statistics
On Fri, Mar 8, 2013 at 8:13 AM, Donald Stufft don...@stufft.io wrote: It does solve the backwards compatibility issue of killing external urls immediately so I'm not flat out against it, but there may be legal issues involved too? I've mentioned this in the other thread as well, but the best way to actually ensure this stuff gets moved over to PyPI is to make it *easy*. Give developers a button to click on PyPI that fetches all their external links (requiring first that you give matching MD5 or other checksums) and uploads them to PyPI, and a whole bunch of those projects are likely to be okay with clicking it a few times. A command-line tool to do it (especially as a distutils/setuptools command) would be a good idea, too. Of the tiny minority of remaining people who object to PyPI hosting for some reason other than convenience/familiarity (e.g. MAL's licensing objection), it will likely be sufficient to provide an option to add #md5 links to their description, in lieu of actual rehosting. FWIW, it's hard to get people to change behavior when one condemns that behavior as unlikeable or socially undesirable, because it means one is less likely to consider the other person's motivations, needs, etc., and on top of that, the other person's resistance and rebellion are stirred up by being the subject of one's disapproval. So please, let's all stop talking about ways to work around the package authors and project maintainers, or how to force them into doing our bidding, and start talking instead about how to make it *easy* and *obvious* for them to do what we want. (And people who think it's already easy and obvious enough, so those 10% of projects must be stupid, will obviously not have anything positive to contribute.) So let me kick off that discussion with a list of known-so-far use cases for external hosting, in descending order of my extremely rough guesstimate of frequency: * Always did it that way, never saw a reason to change, or didn't know you could upload to PyPI * Lots of files that are currently generated on the system where they're hosted, or in an automated system that would need significant rework to support PyPI * Development snapshots (which may in fact be depended upon by other in-development projects, so manual URL specification doesn't help here) * Had an issue w/PyPI availability in the past * Objectors to PyPI's licensing requirements Automation is aimed at the first two: make it easy enough, w/a carrot and a stick (external link spidering is going away, you have to put either the links or the files on PyPI directly if you want them found), and a lot of people will move (assuming they're actually still maintaining their project). Development snapshots are an interesting case, because one of the reasons they're valuable is that PyPI's existing multi-release behavior is a major PITA. You can't upload a new version of something without PyPI creating a new release for it... and automatically hiding all your previous releases, including your stable release. There's a lot that would have to be done to PyPI's release management before it would actually be sane to track such releases there. So the obvious fix is to do nothing; such links being external doesn't hurt availability for people that don't depend on them (unlike rel=homepage/download links). The last two issues are education/persuasion problems that won't be affected by technology changes. Does anybody know of any other use cases for the thousands of projects and releases relying on external link discovery spidering? (Disparaging remarks about why a particular use case is bad, no good, makes you go blind, etc. need not apply: they serve only to show that the person providing the opinion lacks sufficient empathy with the target audience to be *useful* in a discussion of how to persuade that target audience to behave differently.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] hash tags
On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz n...@coderanger.net wrote: MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption. So, you're saying that someone has found a second-preimage attack against MD5 that's more efficient than the current 2**127 threshold established in 2009? Anything security related is pretty broad. Out of the many classes of attacks on hashes, AFAIK the only class that's relevant to PyPI is second preimage attacks, i.e. one where the attacker has the original file and the hash, and must construct a new file that produces the same hash value. Did you have some other type of hash attack in mind? And in either case, do you have a referent for the attack complexity? ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] hash tags
On Fri, Mar 8, 2013 at 4:17 PM, M.-A. Lemburg m...@egenix.com wrote: On 08.03.2013 20:16, PJ Eby wrote: There is, as I said before, a MUCH simpler way to do this, that works right now: put direct #md5 download links in your description, and phase out the rel= attributes altogether. No, that would be a pretty poor design :-) The rel= attributes are good design, since they were meant for exactly this purpose (machine reading and understanding relations between origin and target). That depends on the goal of your design. If the goal is to phase out offsite spidering by downloader tools in a reasonably easy and low-cost way, introducing new API is not a good way to do it. The simple way to do it is to replace download-time end-user unsupervised spidering with upload-time or registration-time author-supervised spidering, which requires only that the tools exist and people be informed of them (and encouraged to use them). ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] hash tags
On Fri, Mar 8, 2013 at 4:26 PM, Donald Stufft don...@stufft.io wrote: On Mar 8, 2013, at 4:12 PM, PJ Eby p...@telecommunity.com wrote: On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz n...@coderanger.net wrote: MD5 is _not_ acceptable for anything security related and we shouldn't be adding anything that increases our dependence on it. MD5's only use in the packaging world is to make people who forget that TCP has its own checksums feel all warm and fuzzy that there hasn't been _accidental_ download corruption. So, you're saying that someone has found a second-preimage attack against MD5 that's more efficient than the current 2**127 threshold established in 2009? Anything security related is pretty broad. Out of the many classes of attacks on hashes, AFAIK the only class that's relevant to PyPI is second preimage attacks, i.e. one where the attacker has the original file and the hash, and must construct a new file that produces the same hash value. Relevant to PyPI is pretty broad, and when you're developing a secure system you need to look past what is ok *today* and design for the next 5, 10, or 20 years. So even if there's no attack that can directly allow replacing the target file with a new one, continuing to utilize it is bad. It has a number of weaknesses which do not install confidence in its future security meanwhile there are a number of other hashes which _do_. Unless you'd rather be trying to replace hashes everywhere once it's already completely broken. We can replace it completely in a lot less than that many years, if the new PEP-based tools can be brought to pass. Using new protocols (e.g. the embedded signatures in wheel files) will make most of this moot. What I'm against is trying to patch over the existing protocol when what we really want is to replace it altogether. Adding hashes and filesizes and whatnot is just gilding the existing lily, or more like gilding the pond scum, actually. ;-) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] hash tags
On Fri, Mar 8, 2013 at 4:28 PM, M.-A. Lemburg m...@egenix.com wrote: On 08.03.2013 20:16, PJ Eby wrote: So, since the page only contains links, might as well put the links straight on PyPI, or at most have an option/tool to load the links from an external source. I don't follow you. We only have a single download_url field available to store a download link. We'd need to modify the meta data format to allow for more than one such field, which doesn't work if you want to stay backwards compatible. Links included in the long description field are placed on the /simple index of links. So you can just edit your standard metadata right this minute if you want to offer more download links. And you can put #md5 tags on them if you want the tools to check that. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] hash tags
On Fri, Mar 8, 2013 at 4:32 PM, Donald Stufft don...@stufft.io wrote: Here's some more information pulled straight from Wikiepdia: Trust me, I've read a LOT of Wikipedia (and even more from other sites, including at least the conclusions of a number of cryptography papers) about hashing attacks recently, because I was seeing inconsistencies in what people are saying about hashes and their weaknesses and so forth. 99.9% of the discussion about attacks on hashes have to do with collision attacks, prefix attacks, and length extension attacks, all of which are extremely relevant for *cryptographic* purposes. Specifically, the use of hashes to verify identity, authority, repudiability, etc... which emphatically do *not* apply to the use of an MD5 as a checksum to verify a correct download. All of these attacks depend on *something else* being at stake besides the integrity of the original message. For example length-extension attacks bypass the need to know a secret used in a naive hash-based signature scheme (which is why you're supposed to use HMAC for such things), while collision attacks let you trick a signer into signing something that you can later replace with something altered. The current use of #md5 tags isn't subject to either of these kinds of attack, because: 1. There is no secret to be revealed, and 2. The author and signer are the same person So the only type of attack I've found out about thus far, in my (admittedly few) hours of study on the subject, that is relevant to the way we use MD5 on PyPI at present is the so-called second pre-image attack, which is when you're given an existing message and a hash, and have to create a new message with the same hash... while also incorporating something useful in the new message. The most recent report I saw on second pre-image attacks against full MD5 estimated a 2**127 strength, meaning that even if you could process a great many billion tries per second, it would take you thousands of years to come up with a file that could masquerade as an existing download. (And most people's computers and/or internet connections would choke on the massive file sizes needed for the still-theoretical Kelsey-Schneier generalized preimage attack, which in any case would apply equally to just about any other hash we could currently put out in the field. i.e., it's not specific to a particular hash algorithm, it just relies on certain properties of the algorithm.) So, yeah, MD5 is *cryptographically* broken, sure. But it's not broken for *data integrity*. And in the PyPI use case, the cryptographic part is all in the SSL being used to fetch the MD5 link in the first place. Here's the important highlights: - specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum Right, that's what's called a collision attack. It means that you can go out *ahead of time*, and make two files with the same checksum, one good, one evil. It does *not* mean you get to take an existing file, and then make a second file with the same checksum. (The latter is a second preimage attack, which is *not* broken Hash collision attacks in PyPI would basically require an author to upload a special version of their package that looked innocent, and then they could later switch that version out with one that's harmful. And the *way* that this works is that you specially generate *both* files, in advance. Which means that the author themselves is compromised, so the threat is moot. The author can already upload compromised code (either through being evil or having their PC hijacked), and what #md5 it has is 100% irrelevant. That is, there's nothing stopping an evil author or an author with a compromised PC from simply uploading a new file with a new MD5, because PyPI will pass it along in exactly the same way. Changing hash algorithms will not affect this threat vector in the slightest. Given these facts, it makes no sense to fuss over the hash algorithm in current use, since a concurrent goal here is to switch to file formats that can be directly signed using, you know, *actual* cryptography. ;-) The new .wheel format makes provisions for modern signature techniques. It'd be good if sdists also did. Then the #md5 tag can die a natural death, hopefully within the year replaced by a hashtag that say, fingerprints the author's public key as registered with PyPI, or something of that sort. In the meantime, there's no actual threat here, so bikeshedding what to replace it with *while keeping the current system* is like rearranging office furniture in a building that's about to have demolition charges set underneath it. ;-) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] homepage/download metadata cleaning
On Fri, Mar 1, 2013 at 6:17 AM, holger krekel hol...@merlinux.eu wrote: On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote: On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote: On 01.03.2013 11:19, holger krekel wrote: Hi Richard, all, somewhere deep in the threads i mentioned i wrote a little cleanpypi.py script which takes a project name as an argument and then goes to pypi.python.org (http://pypi.python.org) and removes all homepage/download metadata entries for this project. This sanitizes/speeds up installation because pip/easy_install don't need to crawl them anymore. I just did this for three of my projects, (pytest, tox and py) and it seems to work fine. Does it also cleanup the links that PyPI adds to the /simple/ by parsing the project description for links ? I think those are far nastier than the homepage and download links, which can be put to some good use to limit the external lookups (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) See e.g. https://pypi.python.org/simple/zc.buildout/ for a good example of the mess this generates... even mailto links get listed and file:/// links open up the installers for all kinds of nasty things (unless they explicitly protect against following these). pip at least, and I assume the other tools don't spider those links, but they do consider them for download (e.g. if the link looks installable it will be a candidate for installing, but it won't fetch it, and look for more links like it will donwnload_url/home_page). I believe that's the way it's structured atm. That's right. Even though the long-description extracted links look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything with them except if the href ends in #egg=PKGNAME- in which case they are taken as pointing to a development tarball (e.g. at github or bitbucket). ASFAIK a link like PKGNAME-VER.tar.gz will not be treated as an installation candidate, just the #egg=PKGNAME one. Both are considered primary links. A primary link is a link whose filename portion matches one of the supported distutils or setuptools file formats, or is marked with an #egg tag. Primary links are indexed as to project name and version, so that if that version/format is chosen as the best candidate, it will be downloaded and installed. Links marked with rel=homepage or rel=download are secondary links. Secondary links are actively retrieved and scanned to look for more primary links. No further secondary links are scanned or followed. (Details of all of this can be found at: http://peak.telecommunity.com/DevCenter/setuptools#making-your-package-available-for-easyinstall ) This basically means that MAL's proposal for a download.html file is actually a bit moot: you can just stick direct primary download URLs in your PyPI description field, and the tools will pick them up. They can even include #md5 info. (See http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api - item 4 mentions the description part.) This means, by the way, that you could make an external link cleaner which spiders the external pages and pulls the candidates onto the description for that release, thereby keeping useful primary links and getting rid of the secondary links used to fetch them. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Fri, Mar 1, 2013 at 4:24 AM, M.-A. Lemburg m...@egenix.com wrote: On 01.03.2013 10:02, Reinout van Rees wrote: On 28-02-13 21:08, holger krekel wrote: I have seen that position in this discussion (I have to upload 120 files per release, so I won't do that, for instance). haven't seen that. Marc-Andre Lemburg said this, which I took to mean 120 uploads per release: However, taking our egenix-mx-base package as example, we have 120 distribution files for every single release. Uploading those to PyPI would not only take long, but also ... Correct, with a total of over 100MB per release. However, the above quote is slightly incorrect: I did not say I won't do that, just that there are issues with doing this: * It currently takes too long uploading that many files to PyPI. This causes a problem, since in order to start the upload, we have to register the release on PyPI, which tools will then immediately find. However, during the upload time, they won't necessarily find the right files to download and then fail. Actually, easy_install doesn't pay any attention to what releases are registered. It just looks for primary and secondary links. If there are links for a version that it can use, it uses it. If it does not find links for a version, then that version does not exist, as far as it is concerned. So registering without files is not a problem. The proposed pull mechanism (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) would work around this problem: tools would simply go to our servers in case they can't find the files on PyPI. That proposal is unnecessary, actually. You could *right now* simply place binary download links (with optional #md5= verification) in your package's description field, and the moment you registered the package, existing tools would find those links and download them from your site. You could then remove your home page and download URLs from the relevant fields, and place them also in the description. (easy_install does not follow non-download links within the description -- i.e., links that don't end in .egg, .tgz, etc. and don't have an #egg tag.) * PyPI doesn't allow us to upload two egg files with the same name: we have to provide egg files for UCS2 Python builds and UCS4 Python builds, since easy_install/setuptools/pip don't differentiate between the two variants. They can if it's part of the platform string; the catch is that right now it's not. We'd have to go through an upgrade cycle of the tools to support that. I need to take a look at what PEP 427 is doing (and you should too, if you haven't already) to get this part sorted out. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] homepage/download metadata cleaning
On Fri, Mar 1, 2013 at 2:31 PM, M.-A. Lemburg m...@egenix.com wrote: Hmm, then why not remove links that don't match the above from the /simple/ index pages ? PyPI provides the links uninterpreted since the tools' interpretations have evolved over time. Note that it's easily possible to make e.g. file:/// links have a fragment that matches what you described, so I guess the filters would have to be more careful about what to allow (e.g. only http/ftp schemes, perhaps even only https schemes) and what not. file:// URLs are an intentionally supported feature of easy_install; many users have local NFS-based or other shared repositories. But yes, it certainly would be reasonable to not include links to them on PyPI. ;-) BTW: Are those links also shown as-is on the description page ? People could do nasty stuff by adding javascript: links which look like normal links to the descriptions. That's true, but is unrelated to the tools, since the tools can't process javascript links. It would probably be best, though, if PyPI filtered such URLs to prevent script injection/CSRF attacks on logged-in PyPI users browsing project descriptions. I don't know if it already does this or not, since I've never tried to inject a CSRF attack on PyPI. ;-) (I guess technically it would be a same-site request forgery rather than a cross-site one, but you know what I mean.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Next generation package infrastructure (was: Deprecate External Links)
On Thu, Feb 28, 2013 at 4:31 AM, M.-A. Lemburg m...@egenix.com wrote: In order for this to work out, you will need to get the support of people hosting packages externally and address their concerns. The current discussion has been too dogmatic for my taste. A more pragmatic approach would likely be a more reasonable and successful way to achieve a transition. I think maybe if we have an uploader tool like the one I mentioned in one of the other spinoff threads, we could address at least the current upload situation by making it super easy to upload your external files. Better still, have a button you can press in the PyPI UI that says, fetch all my external distributions, and it gives you a preview of the download files it's going to fetch (so you can filter out mis-detected ones), and then it does the pulling. Such a tool could survive migration to the new infrastructure as well. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Migrating away from scanning home pages (was: Deprecate External Links)
On Thu, Feb 28, 2013 at 5:55 AM, M.-A. Lemburg m...@egenix.com wrote: I think we all agree that scanning arbitrary HTML pages for download links is not a good idea and we need to transition away from this towards a more reliable system. Here's an approach that would work to start the transition while not breaking old tools (sketching here to describe the basic idea): Limiting scans to download_url -- Installers and similar tools preferably no longer scan the all links on the /simple/ index, but instead only look at the download links (which can be defined in the package meta data) for packages that don't host files on PyPI. Going only one level deep - If the download links point to a meta-file named packagename-version-downloads.html#sha256-hashvalue, the installers download that file, check whether the hash value matches and if it does, scan the file in the same way they would parse the /simple/ index page of the package - think of the downloads.html file as a symlink to extend the search to an external location, but in a predefined and safe way. Clever. This is actually backward compatible with existing tools, in that they will read this file right now. The hashing and verification isn't supported, but we could add warnings to do it. Actually, the essence of your idea can be done even more simply: just require that the link include a hash that the fetched page will be verified against. It essentially ensures that stale external links can't break anything. Further, since the existence of the hash means that the page can't be changed without changing the URL, it means that PyPI *itself* can simply fetch it once, parse the links from it, and serve them directly on the /simple index page. If you change the download URL, PyPI discards the previous links and redoes the scan. All in all, though, I'm not sure it's as viable as a simple upload my external release button (in the UI) and matching setup.py command (for automation) as a way of getting people's releases done. It seems like builidng a downloads.html for your files from SourceForge, say, would be just an annoying intermediate step. (This is assuming, of course, that the licensing issues can be worked out.) * In a later phase of the transition we could have PyPI cache the referenced distribution files locally to improve reliability. This would turn the push strategy for uploading files to PyPI into a pull strategy for those packages and make things a lot easier to handle for package maintainers. I like this part. I think we should just go straight there, and skip the intermediate link formatting stuff. ;-) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Thu, Feb 28, 2013 at 5:00 PM, Donald Stufft donald.stu...@gmail.com wrote: SSL checking on upload should be possible, do you want a patch? If it uses the 'requests' library, yes, I'll accept one. But I don't want to do any direct implementation of SSL cert checking in setuptools, at least in the short run (next few weeks), because: 1. I don't consider myself qualified as yet to write a correct patch or even verify that a contributed patch is correct/safe, and 2. There is a licensing issue with including the Mozilla root certificate set in setuptools under its current license, and I'm not 100% certain I can *change* the license. (I *could* potentially use a platform-provided cert set, but that's not really an option on Windows unless you have Windows expertise above my paygrade for pulling that stuff out of the registry.) So, by delegating to the requests library, I can bypass both of those issues in the short term. In the longer term (1 month from now), more integrated solutions may be more feasible. Using requests is the best I think I can reasonably achieve by PyCon, but I *will* be publicizing a set of instructions for how to safely download setuptools and requests (via https in a browser to prevent MITM attacks), as well as how to configure easy_install for more secure default settings. (And easy_install will always use requests if present, unless specifically asked not to with a --no-ssl-verify option.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 1:34 PM, Lennart Regebro rege...@gmail.com wrote: On Wed, Feb 27, 2013 at 5:34 PM, M.-A. Lemburg m...@egenix.com wrote: I'm not saying that it's not a good idea to host packages on PyPI, but forcing the community into doing this is not a good idea. I still don't understand why not. The only reasons I've seen are Because they don't want to or because they don't trust PyPI. And in the latter case I'm assuming they wouldn't use PyPI at all. I haven't seen anybody mention it yet, but checkouts of development versions are a use case that can't currently be addressed without support for multiple external links. For example, setuptools itself offers SVN checkout URLs for two different branches. I've also seen in-development packages offered via github or bitbucket checkouts as well. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 4:04 PM, Lennart Regebro rege...@gmail.com wrote: On Wed, Feb 27, 2013 at 8:49 PM, Monty Taylor mord...@inaugust.com wrote: But wouldn't this only be a change in pip/easy_install, not PyPI itself? I suppose you could explicitly break the external links by having them point to nothing if you are worried about the security or if it's some performance issue (that would indeed be a bad compatibility break, in case people are using those for other purposes). Otherwise, if it's a problem, then just use the old version of pip. If we don't remove the feature from pypi itself It isn't a feature of PyPI. PyPI doesn't require you to upload the files to PyPI. For that reason, easy_install and PIP will scrape external sites to be able to download the files. What we should do is agree that this should stop, So far, I don't think anybody's talking to the right we for stopping it. It's the tools that control this, not PyPI. (PyPI can't actually stop the tools from using this information without also making itself a lot less useful to *humans* at the same time.) As far as my personal position on the matter, I think that it's reasonable to deprecate the scraping of home page and download links. As somebody pointed out, expired domains are a potentially nasty problem there. OTOH, I currently make development snapshots of setuptools and other projects available by dumping them in a directory that's used as an external download URL. Replacing that would be a PITA because PyPI only lets you upload and register new releases from distutils' command line. Basically, I'd need to use a download link that pointed to a latest URL that redirected to the final download. Anyway, I'm not seeing much discussion here about how to help authors make changes to their release processes. Note that many popular and long-lived projects (pywin32, PIL, etc.) have similar issues. (Not to mention the newer projects that host directly from revision control.) Given that easy_install was deliberately designed so that those guys would *not* need to change their hosting strategies to get automated downloads, I'd like to see more talk about how we're going to help people change their releasing and hosting strategies. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Deprecate External Links
On Wed, Feb 27, 2013 at 4:50 PM, Donald Stufft donald.stu...@gmail.com wrote: Development snapshots are a use case that i'm not sure makes sense for PyPI, but if they do should require specific opt-in to install them. Does easy_install have a command line flag that adds extra links? *chuckle*. Yes, it's the original source of the --find-links option, emulated in pip to ease migration. can your instructions simply state to do the equivalent of `pip install --find-links=http://setuptools.com/dev-snapshopts/`? The problem with find-links is that if you push them off of PyPI, they have to go somewhere else, which is setuptools' dependency-links feature. Now you have an even *harder* problem to update or remove those links, because they're not under the control of the author nor visible on PyPI. Alternatively I would like to get the tooling smarter about not installing pre-release versions unless asked as well. Yes, and that discussion doesn't have much to do with PyPI per se, because again, it's up to the tools. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] HTTPS now promoted on PyPI
On Tue, Feb 19, 2013 at 12:13 AM, Richard Jones r1chardj0...@gmail.com wrote: 2. incorporate some monkey-patching into distribute and setuptools and promote those, This is actually on my radar to do for setuptools, as soon as the dust has settled enough on what it is the monkey-patching needs to *do*. ;-) So far I know I'll be changing the default URLs and adding cert verification, but I haven't looked at the register or upload stuff yet. The part where people are saying https isn't working right now is a big red flag for me, however; I don't want to push out an update that'll just make the load situation worse. In the meantime I'll be investigating and testing, of course. (One annoying issue presently under investigation: determining whether including a cacert bundle means setuptools' license terms will have to change. Pip used LGPL, which appears to be compatible with the MPL. I personally don't think certs should be copyrightable in the first place, but some jurisdictions have compilation copyright of otherwise non-copyrightable individual elements. Presumably, Mozilla's not going to be a jerk about things, but... bleah. Licensing issues *suck*.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] HTTPS now promoted on PyPI
On Tue, Feb 19, 2013 at 8:35 AM, Giovanni Bajo ra...@develer.com wrote: I would be OK with redirecting for browsers (matching the user agent for instance), but I would try to disable for tools as much as possible. Matching paths is an option, too: the /simple index is intended for tools, and the main /pypi index for humans. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Remove pypi redirects
On Tue, Feb 19, 2013 at 1:31 PM, Marcus Smith qwc...@gmail.com wrote: looking on the bright side, it made us aware that we had a leak to pypi in our build. we were trying to be local. so thanks. Had to go update our .pydistutils.cfg file Marcus FYI, easy_install's --allow-hosts option can prevent such leaks. (But maybe that's why you're editing pydistutils.cfg ;-) ) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] New PyPI stats available
On Mon, Feb 18, 2013 at 9:55 AM, Alex Clark acl...@aclark.net wrote: aclark@Alexs-MacBook-Pro:~/Developer/aclark/resume/ vanity pydstat pydstat-1.0.0.tar.gz 2012-08-152,216 pydstat-1.0.1.tar.gz 2012-08-234,367 pydstat has been downloaded 6,583 times! Nice -- any chance you could add version filtering? vanity setuptools reports ~8.4 million downloads for setuptools, but the current release actually stands at only around 4.8 million. ;-) (Also, the formatting is off for the most popular downloads, because the count column isn't wide enough to show 7 significant figures.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Allowing the upload of .py files at PyPI
On Thu, Feb 14, 2013 at 6:31 PM, Richard Jones rich...@python.org wrote: The bootstrap.py file would most likely have to be omitted from the usual files listing mechanisms as they are used to determine installable release packages. I would feel more comfortable with the proposed mechanism if it allowed the .py files to retain their original names. There is a ton of collateral out there referring people to ez_setup.py, and while I can (and will) redirect the original URL to wherever it ends up, it'd be less confusing to keep the name. Among other things, it would help prevent the sort of phishing attack where somebody represents *their* ez_setup.py script as the real deal, while saying that setuptools/bootstrap.py is an obvious forgery, since it's not named ez_setup.py. ;-) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Proposal for the bootstrap API
On Fri, Feb 15, 2013 at 8:10 AM, Nick Coghlan ncogh...@gmail.com wrote: On Fri, Feb 15, 2013 at 10:25 PM, Tarek Ziadé ta...@ziade.org wrote: Anyways: I am withdrawing my proposal - if we're special-casing a few projects, why bother creating a new API in the first place ? That's why I asked how frequently the bootstrap files needed updates earlier - if they're fairly static, then simply asking for a copy to be hosted on PyPI and documenting that as the canonical location is by far the most straightforward solution. The only reason for an API would be if the projects wanted to be able to update them directly without asking the PyPI admins to upload a new version (and, as you note, that could potentially be handled via ssh/scp config rather than via the PyPI web app). Also, it may make sense to get rid of the bootstrap files in the long run anyway. ez_setup started the whole business with only one real function: to solve the chicken-and-egg problem of allowing developers to make use of dependencies without first needing their users to install setuptools. Is that a problem that actually needs solving any more, almost a decade later? (Apart from that use, the only thing it's good for is helping 64-bit Windows users install the right version of setuptools in the right place, and there will probably be a better fix for that eventually as well.) Buildout actually has a better reason than any of the other projects to keep a bootstrap file around, and that's that it's targeted at a general sysadmin audience not steeped in Python packaging lore. So having a bootstrap makes a lot of sense... except that there's no reason it needs to live on PyPI, per se. Zope corp. undoubtedly has secure hosting and certs of their own, and the very thing that makes them need a bootstrap script means that the people who need it don't really care *what* secure source they pull it from. It's possible I'm misunderstanding some things there, and I hope Jim will chime in with corrections if applicable. But I'm thinking maybe instead of working out PyPI hosting for these things, we should just get rid of them or host them elsewhere. (I have at least one domain w/a trusted cert that could be used, for example.) (One additional point, though: for ez_setup.py's main use case, it's currently distributed by way of anonymous SVN, and zillions of source packages already hosted on PyPI. Most of the time, the copy somebody uses *already* came from somewhere other than the primary source. Factor *that* into the phishing scenarios for a bit...) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Allowing the upload of .py files at PyPI
On Thu, Feb 14, 2013 at 5:10 PM, Nick Coghlan ncogh...@gmail.com wrote: I'm more concerned about phishing style attacks. I don't want the PyPI admins to have to start scanning for hostile names like distirbute. I'm not sure what you mean. These things exist only for the corresponding package (buildout, setuptools, or distribute), and aren't downloaded from any other project. Generally, they are downloaded either by 1) a human, or 2) another tool that wants to support installation in the absence of a pre-existing setuptools or distribute installation (mainly zc.buildout AFAIK). (Or are you saying that somebody might upload a project called, say, distribute_, and try to trick people into downloading it? I'm not sure how that's a threat that can be defended against in any event.) So how often do the bootstrap files change? Setuptools releases an updated version with each new release, as it contains an MD5 signature for downloading the new release. I *think* distribute does the same. Not so sure about buildout. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] PyPI and setuptools
On Sat, Feb 9, 2013 at 6:43 PM, M.-A. Lemburg m...@egenix.com wrote: * distutils config files: http://docs.python.org/2/install/index.html#inst-config-files * setuptools: http://peak.telecommunity.com/DevCenter/EasyInstall#configuration-files http://peak.telecommunity.com/DevCenter/EasyInstall#command-line-options (the option is called --index-url) * distribute: http://pythonhosted.org/distribute/easy_install.html#configuration-files http://pythonhosted.org/distribute/easy_install.html#reference-manual (the option is called --index-url) Also, you can run this to easily change the setting site-wide (with either setuptools or distribute): sudo python setup.py saveopts -g easy_install --index-url https://pypi.python.org/simple It'll give you an error message about no URLs being provided, but first it'll update the global disutils.cfg for that version of Python or that virtualenv, e.g.: $ sudo python setup.py saveopts -g easy_install --index-url https://pypi.python.org/simple running saveopts Writing /usr/lib/python2.6/distutils/distutils.cfg running easy_install error: No urls, filenames, or requirements specified (see --help) (If you want to restrict easy_install to only download from pypi by default, you can also add an --allow-hosts setting to the easy_install part of the command line.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] PyPI and setuptools
On Mon, Feb 11, 2013 at 2:55 AM, Marcus Smith qwc...@gmail.com wrote: As for then making Distribute the default in virtualenv's (or the only option), there is a virtualenv issue for that. https://github.com/pypa/virtualenv/issues/217 apparently there's an issue with UAC elevation on windows. that issue could use some help getting going... There's a fix for the UAC issue in the current release of setuptools, if that helps. (Actually, I think it was put in a couple of releases ago. Either way, it should be in the setuptools commit logs from a few years ago. There are a number of bugs like this that were fixed in setuptools many years ago, but never merged by distribute; I don't think anybody from distribute has been monitoring the setuptools tracker or repository much since the original divergence.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] PyPI and setuptools
On Tue, Feb 12, 2013 at 2:11 PM, Giovanni Bajo ra...@develer.com wrote: Il giorno 12/feb/2013, alle ore 19:36, PJ Eby p...@telecommunity.com ha scritto: On Sat, Feb 9, 2013 at 7:54 PM, Giovanni Bajo ra...@develer.com wrote: The problem with this approach is that Python standard library does not validate SSL certificates. So even if you force a urllib-based tool to access PyPI through https, it doesn't help at all in case of a MITM attack. FWIW, if someone provides a suitable *cross-platform* urllib monkeypatch that does certificate validation, even if it only validates PyPI's certificate, I'll add it to setuptools and issue a patch release that uses it, and has its default index URL updated to the https version. This is an option: https://gist.github.com/zed/1347055 it's not a monkeypatch, but it's a handler. You probably want to include a CA bundle (eg: the Mozilla one like pip is doing), and use that by default. Thanks! TBH, cert stuff makes my head hurt, which is why there's not more of it in setuptools already: I hesitate to sprinkle a dash of stuff I don't understand on top of other things and call the problem solved. That seems like something of an antipattern to me. But I suppose I'll need to learn some of it at least, in order to be able to build a CA bundle, unless I steal whatever pip does. I can start on integrating this in the meantime at least, and hopefully can get it out around the same time that PyPI's cert is updated. I'm nonetheless hesitant to conclude that the problem of security on *non* PyPI sites or handling redirects or all the rest of it will all be resolved in a single patch release, though. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] [Distutils] imp.find_modules and namespaces
On Mon, Feb 11, 2013 at 11:40 AM, Alessandro Dentella san...@e-den.it wrote: I believe that this issue belongs to this list, please let me know if I'm wrong. Suppose I have 2 packages: jmb.foo jmb.bar distributed separately. Each has in jmb's __init__ a standard: __import__('pkg_resources').declare_namespace(__name__) or from pkgutil import extend_path __path__ = extend_path(__path__, __name__) I just realized that imp.find_module() will return fake values imp.find_module('jmb', None) may return (a tuple with) the path from the first package or from the second. Many framework will fail to discover commands in the inner module: one is detailed here [1] another is Django way of getting application's commands. I find it misleading to return a value that is not thorohly correct. Is there a workaround? Is the current behaviour considered correct for reasons I don't yet understand? Since Python 2.5, the right way to do this is with pkgutil.iter_modules() (for a flat list) or pkgutil.walk_packages() (for a subpackage tree). For your example, if I wanted to find just the subpackages of 'jmb', I would do: import jmb, pkgutil for (module_loader, name, ispkg) in pkgutil.iter_modules(jmb.__path__, 'jmb.'): # 'name' will be 'jmb.foo', 'jmb.bar', etc. # 'ispkg' will be true if 'jmb.foo' is a package, false if it's a module ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] [Distutils] imp.find_modules and namespaces
On Mon, Feb 11, 2013 at 4:56 PM, Alessandro Dentella san...@e-den.it wrote: thanks for the answer but this way I need to really import jmb while imp.find_module doesn't really import it. If you want to know whether the module 'jmb' exists, you can certainly do that by using pkgutil.iter_modules(). What you *can't* do -- in *any* version of Python as far as I know -- is tell for certain whether 'jmb.foo' exists, without first importing jmb. (Since until jmb is imported, there's no way to know what __path__ value it will end up with.) This is true for namespace packages in all versions of Python; the best that you can do is try to write code that does the same thing as the import system... but even then your code will be just guessing (and failing to guess correctly) in the case where a package's initialization involves altering its __path__ or if .pth files with dynamic code are involved. Similarly, for any module foo.bar.baz, foo.bar must be imported in order to know what path to use for checking for the existence of foo.bar.baz. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] disabling the serving of links from description_html?
On Tue, Dec 18, 2012 at 11:46 AM, M.-A. Lemburg m...@egenix.com wrote: AFAIK, setuptools/distribute only looks at links with rel=homepage or rel=download attributes, not all links on the PyPI project page. The links from the description don't receive such attributes. Those are the only links that are unconditionally followed, yes. But all links it sees are parsed to see if they appear to be a direct download link (e.g. .tgz, .zip, .egg, #egg= link, etc.). They're just not *followed* unless they appear to be a direct link to a desired version of something, or if it's marked as a homepage or download link. All other on-page links are ignored, whether they're part of the description or otherwise. (Any given link is also retrieved at most once per run of easy_install.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Flag to tell pip to only install uploaded files
On Fri, Jun 22, 2012 at 8:21 PM, Aaron Meurer asmeu...@gmail.com wrote: Hi. I'm following up on a discussion on the pip mailing list ( https://groups.google.com/forum/#!topic/python-virtualenv/PZNj9pC6aKA/discussion ), where I was directed here. Would it be possible to add some kind of a flag to PyPI that would let package maintainers tell pip to install only the uploaded file (or possibly also the file given by a direct link), and no others? Currently, pip aggressively tries to find the latest version of a package by crawling all links on the PyPI page, even those from older versions. This is a headache to me as a package maintainer because it means that pip is quite often installing the wrong thing. Recently, pip was trying to install our html docs because we had a file uploaded at Google Code named sympy-0.7.1-html-docs, The simple way to correct this problem is to rename the file 'sympy-html-docs-0.7.1' - this will fix things for all installers that follow easy_install's discovery protocol, including pip and zc.buildout. which it deemed to be a newer version than sympy-0.7.1. There's also the issue that every time we put out a release candidate for a new version, pip starts installing that, when I would prefer it to only install stable final releases. It's also, as I noted on the other discussion list, a bit of a security risk. zc.buildout includes a flag to prefer stable releases, and I believe some other installation tools do as well. You might suggest they add such a flag to pip and move towards using it by default. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] What is the point of pythonpackages.com?
On Mon, Feb 6, 2012 at 3:17 PM, Andreas Jung li...@zopyx.com wrote: My point about this: if a person does not want to host its package on PyPi than it should stay away from PyPI. Package hygiene and a certain level of professional package repository is more important and personal reasons for not hosting packages on PyPI. Note that PyPI is also used to publish metadata about packages which are in development and only available in snapshot releases or revision control systems. So the it shouldn't be hosted elsewhere argument doesn't really wash. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] What is the point of pythonpackages.com?
On Tue, Feb 7, 2012 at 11:18 AM, Martijn Faassen faas...@startifact.comwrote: On 02/07/2012 07:18 AM, Kai Diefenbach wrote: If a listed package is not available (because an external server is down) the index is broken. That's an interesting observation. I would think 'broken' is strong language, but it the index can at least be considered incorrect in that particular instance. If people have tools that rely on the index being correct, then this it being incorrect can be a problem. You can either say those tools shouldn't be used for real development work (you're doing it wrong), or encourage people to provide the package on PyPI as well (encouragement as a social solution), or consider facilities to provide redundancy (caching, mirroring) to help with the experience (a technical solution). Note, too, that prior to setuptools' development, there wasn't even any expectation that projects listed on PyPI even have a current *release*, or even have any *source code written* , let alone packages available for download from PyPI itself. (PyPI uploading was developed around the same time as the first versions of setuptools and EasyInstall.) Just because the common use-case for PyPI nowadays is to pull down installation files, doesn't mean the previous use cases which PyPI catered to are gone or not worth supporting any more. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] What is the point of pythonpackages.com?
On Tue, Feb 7, 2012 at 12:06 PM, Donald Stufft donald.stu...@gmail.comwrote: On Tuesday, February 7, 2012 at 12:02 PM, PJ Eby wrote: On Mon, Feb 6, 2012 at 3:17 PM, Andreas Jung li...@zopyx.com wrote: My point about this: if a person does not want to host its package on PyPi than it should stay away from PyPI. Package hygiene and a certain level of professional package repository is more important and personal reasons for not hosting packages on PyPI. Note that PyPI is also used to publish metadata about packages which are in development and only available in snapshot releases or revision control systems. So the it shouldn't be hosted elsewhere argument doesn't really wash.' This is a matter of opinion really, Personally I think if your package is in development you should publish snapshot releases to PyPI. Yes, but now we get into the wonderful world of how many releases do you actually want active vs. hidden vs. deleted, and now there are that many more files to be possible frozen and mirrored and archived and whatnot, which isn't really suitable for such dev releases. (Also, in the specific case of my snapshot-only packages, I have automated builds that keep a rotating set of snapshots in a server-local download directory for public access; I wouldn't want that build process automatically uploading that stuff to PyPI, as it adds more moving parts for things to break on my end.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Distutils sdist formats best practice
On Mon, Feb 6, 2012 at 12:19 PM, Alex Clark acl...@aclark.net wrote: What do pip/easy_install/etc do when they encounter both a .zip and a .tar.gz, for example? IIRC, easy_install will take the longer filename in preference to the shorter one, all else being equal; that's its final tiebreaker after what kind of thing it expects to find at a given URL. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] Proposal: close the PyPI file-replacement loophole
On Wed, Feb 1, 2012 at 6:06 AM, Yuval Greenfield ubershme...@gmail.comwrote: Does the setup.py/cfg allow me to require a specific hash on SQLAlchemy when automatically resolving dependencies in pip/easy_install? Yes, at least for easy_install. You tack on #md5= to your find_links URLs, and specify an exact version. easy_install will refuse to install them if the MD5 doesn't match. (This will work better for source packages than binaries, of course, since you'd only need to include one link and MD5 signature in that case.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig