Re: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files
In addition, maintainers of installation tools are asked to release two updates. The first one shall provide clear warnings [...] The second update for installation tools should change the default mode to allow only installation of package files hosted at the index domain, sounds good to me. It is expected that tools in this release may choose to change the default index url to ``https://pypi.python.org/simple/-with-ext``https://pypi.python.org/simple/-with-extin so, *eventually*, the /simple interface (that has been transitioned to only serve pypi links) could be deprecated? (because new tools would be smart enough to responsibly navigate /simple/-with-ext) but slightly ironic that we'd be left with an interface called simple/-with-ext, given the goal of all this, but it makes sense. Marcus ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
[Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
Hi all, in particular Philip, Marc-Andre, Donald, Carl and me decided to simplify the PEP and avoid the somewhat awkward ``simple/-with-externals`` index for various reasons, among them Marc-Andre's criticisms. This also means present-day installation tools (shipped with Redhat/Debian/etc.) will continue to work as today for those packages which remain in a hosting-mode that requires crawling and scraping. They will still benefit from the fact that most packages will soon have a hosting-mode that avoids it. Future releases of installation tools will default to not perform crawling or using (scraped) external links, and new PYPI projects will default to only serve uploaded files. The V4 pre-PEP also renames the three PyPI hosting modes to be more descriptive. Since all three modes allow external links, pypi-ext vs pypi-only were misleading. The new naming distinguishes the mode that both scrapes links from metadata and crawls external pages for more links (pypi-scrape-crawl) from the mode that only scrapes links from metadata (pypi-scrape) from the mode where all links are explicit (pypi-explicit). Without the separate external index, it also turns out that the two transition phases are separated into PyPI changes (phase one) and installer-tool updates (phase two). There are no PyPI changes necessary in phase two. As stated in a new open question, it should be possible to do PEP-related installation tool updates during phase 1, that may require a bit of clarification in the PEP's language still. Carl and me are happy with this PEP version now and hope you all are as well. Donald is already working on improving the analysis tool so we hopefully have some updated numbers soon. cheers, Holger PEP: XXX Title: Transitioning to release-file hosting on PyPI Version: $Revision$ Last-Modified: $Date$ Author: Holger Krekel hol...@merlinux.eu, Carl Meyer c...@oddbird.net Discussions-To: catalog-sig@python.org Status: Draft (PRE-submit V4) Type: Process Content-Type: text/x-rst Created: 10-Mar-2013 Post-History: Abstract This PEP proposes a backward-compatible two-phase transition process to speed up, simplify and robustify installing from the pypi.python.org (PyPI) package index. To ease the transition and minimize client-side friction, **no changes to distutils or existing installation tools are required in order to benefit from the first transition phase, which will result in faster, more reliable installs for most existing packages**. The first transition phase implements an easy and explicit means for a package maintainer to control which release file links are served to present-day installation tools. The first phase also includes the implementation of analysis tools for present-day packages, to support communication with package maintainers and the automated setting of default modes for controlling release file links. The first phase also will make new projects on PYPI use a default to only serve links to release files which were uploaded to PYPI. The second transition phase concerns end-user installation tools, which shall default to only install release files that are hosted on PyPI and tell the user if external release files exist, offering a choice to automatically use those external files. Rationale = .. _history: History and motivations for external hosting When PyPI went online, it offered release registration but had no facility to host release files itself. When hosting was added, no automated downloading tool existed yet. When Philip Eby implemented automated downloading (through setuptools), he made the choice to allow people to use download hosts of their choice. The finding of externally-hosted packages was implemented as follows: #. The PyPI ``simple/`` index for a package contains all links found by scraping them from that package's long_description metadata for any release. Links in the Download-URL and Home-page metadata fields are given ``rel=download`` and ``rel=homepage`` attributes, respectively. #. Any of these links whose target is a file whose name appears to be in the form of an installable source or binary distribution, with name in the form packagename-version.ARCHIVEEXT, is considered a potential installation candidate by installation tools. #. Similarly, any links suffixed with an #egg=packagename-version fragment are considered an installation candidate. #. Additionally, the ``rel=homepage`` and ``rel=download`` links are crawled by installation tools and, if HTML, are themselves scraped for release-file links in the above formats. Today, most packages released on PyPI host their release files on PyPI, but a small percentage (XXX need updated data) rely on external hosting. There are many reasons [2]_ why people have chosen external hosting. To cite just a few: - release processes and scripts have been developed already and upload to external sites -
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
Do we even need the internal/external rel info? I was planning to just use the URL hostname. i.e., are there any use cases for designating an externally-hosted file internal, or an internally-hosted file external? If not, it seems the rel= is redundant. It's also more work to implement, vs. just defaulting --allow-hosts to be the --index-url host; a strategy ISTM pip could also use, since it has the same two options available. Also, if we're not doing homepage/download crawling any more, I was hoping we could just drop the code that 'parses' rel= links in the first place, as it's an awkward ugly hack. ;-) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On Mar 15, 2013, at 11:15 AM, PJ Eby p...@telecommunity.com wrote: Do we even need the internal/external rel info? I was planning to just use the URL hostname. i.e., are there any use cases for designating an externally-hosted file internal, or an internally-hosted file external? If not, it seems the rel= is redundant. It's also more work to implement, vs. just defaulting --allow-hosts to be the --index-url host; a strategy ISTM pip could also use, since it has the same two options available. Also, if we're not doing homepage/download crawling any more, I was hoping we could just drop the code that 'parses' rel= links in the first place, as it's an awkward ugly hack. ;-) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig It makes things uglier for end users if you have packages and the simple index hosted on several sites. It also just adds extra information so if setuptools/easy_install wants to just use the host case that wouldn't be bad. It's actually more defensible to keep the service (ala PyPI/simple index) and the user uploaded content (ala distribution files) hosted on separate domains as it makes things like gifar style attacks harder to execute. Making a move like that would break mirroring ATM on PyPI but it's good information to include on the simple index to make it simpler for tools to determine what links are internal and what are external. FWIW Crate has the uploaded files on an external domain for just this reason. (Also for CDN reasons but that's because a SSL CDN is ). - Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA signature.asc Description: Message signed with OpenPGP using GPGMail ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On Fri, Mar 15, 2013 at 11:15 -0400, PJ Eby wrote: Do we even need the internal/external rel info? I was planning to just use the URL hostname. i.e., are there any use cases for designating an externally-hosted file internal, or an internally-hosted file external? If not, it seems the rel= is redundant. It's also more work to implement, vs. just defaulting --allow-hosts to be the --index-url host; a strategy ISTM pip could also use, since it has the same two options available. Also, if we're not doing homepage/download crawling any more, I was hoping we could just drop the code that 'parses' rel= links in the first place, as it's an awkward ugly hack. ;-) We wanted to avoid requiring hostname-checking especially in light of parallel developments putting PYPI release files on a CDN, i.e. non pypi.python.org domains. The rel=internal communicates that this link is under control of the index server and the installer should not be worried and users need not know about allow-hosts etc. For example, Donald's https://crate.io is already operating in this manner and has its files on crate-cdn.com. best, holger ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On 03/15/2013 09:15 AM, PJ Eby wrote: Do we even need the internal/external rel info? I was planning to just use the URL hostname. i.e., are there any use cases for designating an externally-hosted file internal, or an internally-hosted file external? If not, it seems the rel= is redundant. Right; Donald and Holger already gave the rationale for this: there are good reasons for an index to not have internal links actually on the exact same hostname. Even just using a different subdomain would break simple host comparison. It's also more work to implement, vs. just defaulting --allow-hosts to be the --index-url host; a strategy ISTM pip could also use, since it has the same two options available. Pip actually doesn't currently have --allow-hosts, although there's no good reason for that; it ought to. Also, if we're not doing homepage/download crawling any more, I was hoping we could just drop the code that 'parses' rel= links in the first place, as it's an awkward ugly hack. ;-) Well, parsing HTML links as an API is an ugly hack, but within that existing framework rel seems like the appropriate semantic attribute for this type of information, not really upping the hackiness quotient :-) Carl signature.asc Description: OpenPGP digital signature ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files
Hi Marcus, On 03/15/2013 01:32 AM, Marcus Smith wrote: In addition, maintainers of installation tools are asked to release two updates. The first one shall provide clear warnings [...] The second update for installation tools should change the default mode to allow only installation of package files hosted at the index domain, sounds good to me. Excellent, having the installer-tool maintainers on-board is obviously important here :-) It is expected that tools in this release may choose to change the default index url to ``https://pypi.python.org/simple/-with-ext`` https://pypi.python.org/simple/-with-ext in so, *eventually*, the /simple interface (that has been transitioned to only serve pypi links) could be deprecated? (because new tools would be smart enough to responsibly navigate /simple/-with-ext) but slightly ironic that we'd be left with an interface called simple/-with-ext, given the goal of all this, but it makes sense. Right, it was precisely this awkwardness (the likelihood that tools would want to default to -with-ext and use host-comparison to distinguish internal/external, so as to provide info about external links with a single request-response) that led us to eliminate the separate indexes in our latest V4 draft and use rel attributes to distinguish link types. Carl signature.asc Description: OpenPGP digital signature ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On Fri, Mar 15, 2013 at 12:07 PM, Carl Meyer c...@oddbird.net wrote: On 03/15/2013 09:15 AM, PJ Eby wrote: Do we even need the internal/external rel info? I was planning to just use the URL hostname. i.e., are there any use cases for designating an externally-hosted file internal, or an internally-hosted file external? If not, it seems the rel= is redundant. Right; Donald and Holger already gave the rationale for this: there are good reasons for an index to not have internal links actually on the exact same hostname. Even just using a different subdomain would break simple host comparison. It's also more work to implement, vs. just defaulting --allow-hosts to be the --index-url host; a strategy ISTM pip could also use, since it has the same two options available. Pip actually doesn't currently have --allow-hosts, although there's no good reason for that; it ought to. Also, if we're not doing homepage/download crawling any more, I was hoping we could just drop the code that 'parses' rel= links in the first place, as it's an awkward ugly hack. ;-) Well, parsing HTML links as an API is an ugly hack, but within that existing framework rel seems like the appropriate semantic attribute for this type of information, not really upping the hackiness quotient :-) Well, to be clear, I liked previous versions of the proposal better than this one. But while I *really* don't want to do any new rel parsing, that's not the only or even the most important reason. The main reason is that I think internal vs. external is a bogus distinction: what's important (IMO) is what hosts you do and don't trust. Giving a blanket pass to all external links doesn't seem like such a good idea to me, nor does allowing the index to define what hosts the client should trust. As for the internal ones, I'm not sure why we can't at least make a subdomain requirement, or have users explicitly add a PyPI CDN to their configured --allow-hosts. To try to put it another way: there should be one, and preferably only one, obvious way to specify where you get downloads from. That way in easy_install is currently --allow-hosts. Adding new options that interact and overlap with that looks like bad UI design to me, increasing the possibility of user confusion. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On Mar 15, 2013, at 12:51 PM, PJ Eby p...@telecommunity.com wrote: On Fri, Mar 15, 2013 at 12:07 PM, Carl Meyer c...@oddbird.net wrote: On 03/15/2013 09:15 AM, PJ Eby wrote: Do we even need the internal/external rel info? I was planning to just use the URL hostname. i.e., are there any use cases for designating an externally-hosted file internal, or an internally-hosted file external? If not, it seems the rel= is redundant. Right; Donald and Holger already gave the rationale for this: there are good reasons for an index to not have internal links actually on the exact same hostname. Even just using a different subdomain would break simple host comparison. It's also more work to implement, vs. just defaulting --allow-hosts to be the --index-url host; a strategy ISTM pip could also use, since it has the same two options available. Pip actually doesn't currently have --allow-hosts, although there's no good reason for that; it ought to. Also, if we're not doing homepage/download crawling any more, I was hoping we could just drop the code that 'parses' rel= links in the first place, as it's an awkward ugly hack. ;-) Well, parsing HTML links as an API is an ugly hack, but within that existing framework rel seems like the appropriate semantic attribute for this type of information, not really upping the hackiness quotient :-) Well, to be clear, I liked previous versions of the proposal better than this one. But while I *really* don't want to do any new rel parsing, that's not the only or even the most important reason. The main reason is that I think internal vs. external is a bogus distinction: what's important (IMO) is what hosts you do and don't trust. Giving a blanket pass to all external links doesn't seem like such a good idea to me, nor does allowing the index to define what hosts the client should trust. As for the internal ones, I'm not sure why we can't at least make a subdomain requirement, or have users explicitly add a PyPI CDN to their configured --allow-hosts. To try to put it another way: there should be one, and preferably only one, obvious way to specify where you get downloads from. That way in easy_install is currently --allow-hosts. Adding new options that interact and overlap with that looks like bad UI design to me, increasing the possibility of user confusion. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig You can do that fwiw. That's fine. You can optionally just use the internal links as a indicator about which hosts should automatically be added to the a--allow-hosts for a particular index. - Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA signature.asc Description: Message signed with OpenPGP using GPGMail ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On 03/15/2013 10:51 AM, PJ Eby wrote: Giving a blanket pass to all external links doesn't seem like such a good idea to me, This is a very good point, and it should be made clearer in the PEP that we don't recommend a single blanket option to allow all external links, but an option (like allow-hosts) that lets you specify with more granularity which external links to use. I think perhaps rel=external confuses this point; the real purpose of the rel tags is just so that rel=internal can be considered part of the index. FWIW I think it would be just as reasonable UI for a hypothetical tool to let you say I want to trust external links for the Foo project rather than I want to trust external links to djangoproject.com and avoid host-comparison altogether. IOW, I don't think hostname is inherently a better or safer indicator of trust than project name; hosts can change ownership at least as easily and silently as PyPI projects! So I don't think the PEP should require all installer tools to choose trust-by-hostname (which would be implied by removing the rel tags). nor does allowing the index to define what hosts the client should trust. I'm not sure about this. By using an index at all, you are trusting that index to provide whatever level of reliability/stability/security/whatever you expect from it. Allowing the index itself to specify that it keeps its files on a different host in a way that is transparent to the user seems like a natural extension of this trust that doesn't harm anything and aids usability greatly. (Cases where the index is lying to you definitely fall outside the scope of what this PEP is aiming to help with.) As for the internal ones, I'm not sure why we can't at least make a subdomain requirement, or have users explicitly add a PyPI CDN to their configured --allow-hosts. Even a subdomain requirement can make a CDN more difficult/expensive to implement. And once you go beyond simple host-equality comparisons and into subdomain-equivalence I'm wary of the added implementation complexity we're asking of every installer tool, and the potential for subtle differences in implementation. This seems to me like a worse can of worms than rel-parsing. To try to put it another way: there should be one, and preferably only one, obvious way to specify where you get downloads from. That way in easy_install is currently --allow-hosts. Adding new options that interact and overlap with that looks like bad UI design to me, increasing the possibility of user confusion. Like Donald says, I don't see any problem with you choosing to keep allow-hosts as the only user-facing option for easy_install. It would be up to you whether you also want to use rel=internal as a hint for implicitly (perhaps with warning) adding to --allow-hosts, to allow better compatibility with indexes that use a different host for file-hosting (it's possible that even PyPI itself may move into this category, I haven't been following the CDN discussions carefully). PyPI wouldn't be enforcing a UI on you here, just providing metadata that you can use as you wish. I do think the internal/external distinction is meaningful and unambiguous metadata that the index is able to provide, and there's no reason for the index to withhold it. (That distinction is not new in this version of the PEP, either, it's just made via rel tags now instead of via a separate index.) Carl signature.asc Description: OpenPGP digital signature ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
Thanks, Holger. This version looks a lot better :-) There are still some minor quirks which would need to be addressed more explicitly, but overall, this proposal provides a good way forward. Perhaps it would also be possible to add the secured download links and the caching/proxying ideas to the PEP at some point, or we turn those into a new PEP. I can't follow up in detail today, but will have a closer look next week. On 15.03.2013 10:29, holger krekel wrote: Hi all, in particular Philip, Marc-Andre, Donald, Carl and me decided to simplify the PEP and avoid the somewhat awkward ``simple/-with-externals`` index for various reasons, among them Marc-Andre's criticisms. This also means present-day installation tools (shipped with Redhat/Debian/etc.) will continue to work as today for those packages which remain in a hosting-mode that requires crawling and scraping. They will still benefit from the fact that most packages will soon have a hosting-mode that avoids it. Future releases of installation tools will default to not perform crawling or using (scraped) external links, and new PYPI projects will default to only serve uploaded files. The V4 pre-PEP also renames the three PyPI hosting modes to be more descriptive. Since all three modes allow external links, pypi-ext vs pypi-only were misleading. The new naming distinguishes the mode that both scrapes links from metadata and crawls external pages for more links (pypi-scrape-crawl) from the mode that only scrapes links from metadata (pypi-scrape) from the mode where all links are explicit (pypi-explicit). Without the separate external index, it also turns out that the two transition phases are separated into PyPI changes (phase one) and installer-tool updates (phase two). There are no PyPI changes necessary in phase two. As stated in a new open question, it should be possible to do PEP-related installation tool updates during phase 1, that may require a bit of clarification in the PEP's language still. Carl and me are happy with this PEP version now and hope you all are as well. Donald is already working on improving the analysis tool so we hopefully have some updated numbers soon. cheers, Holger PEP: XXX Title: Transitioning to release-file hosting on PyPI Version: $Revision$ Last-Modified: $Date$ Author: Holger Krekel hol...@merlinux.eu, Carl Meyer c...@oddbird.net Discussions-To: catalog-sig@python.org Status: Draft (PRE-submit V4) Type: Process Content-Type: text/x-rst Created: 10-Mar-2013 Post-History: Abstract This PEP proposes a backward-compatible two-phase transition process to speed up, simplify and robustify installing from the pypi.python.org (PyPI) package index. To ease the transition and minimize client-side friction, **no changes to distutils or existing installation tools are required in order to benefit from the first transition phase, which will result in faster, more reliable installs for most existing packages**. The first transition phase implements an easy and explicit means for a package maintainer to control which release file links are served to present-day installation tools. The first phase also includes the implementation of analysis tools for present-day packages, to support communication with package maintainers and the automated setting of default modes for controlling release file links. The first phase also will make new projects on PYPI use a default to only serve links to release files which were uploaded to PYPI. The second transition phase concerns end-user installation tools, which shall default to only install release files that are hosted on PyPI and tell the user if external release files exist, offering a choice to automatically use those external files. Rationale = .. _history: History and motivations for external hosting When PyPI went online, it offered release registration but had no facility to host release files itself. When hosting was added, no automated downloading tool existed yet. When Philip Eby implemented automated downloading (through setuptools), he made the choice to allow people to use download hosts of their choice. The finding of externally-hosted packages was implemented as follows: #. The PyPI ``simple/`` index for a package contains all links found by scraping them from that package's long_description metadata for any release. Links in the Download-URL and Home-page metadata fields are given ``rel=download`` and ``rel=homepage`` attributes, respectively. #. Any of these links whose target is a file whose name appears to be in the form of an installable source or binary distribution, with name in the form packagename-version.ARCHIVEEXT, is considered a potential installation candidate by installation tools. #. Similarly, any links suffixed with an #egg=packagename-version
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On Fri, Mar 15, 2013 at 1:39 PM, Carl Meyer c...@oddbird.net wrote: up to you whether you also want to use rel=internal as a hint for implicitly (perhaps with warning) adding to --allow-hosts, That's the bit I don't like. The security model is that if it's not allowed by allowed-hosts, it's *not allowed*. Introducing a way to sneak something past allow-hosts is a bad idea, because it means people either have to explicitly widen their allow-hosts to arbitrary hosts, or else that you can't actually enforce an allowed-hosts policy, or that you need to learn a whole bunch of options to implement it. ISTM that this is a bad design choice for users, and I'm not comfortable with this without some way to define the allowed internal hosts based in some way on the base index URL. Not just for ease of automated translation, but so that *users* can know who they're dealing with, and easily predict the effects of their chosen options. A frequent refrain has been, users don't know they're downloading stuff from places other than PyPI, so if this new approach allows downloads from somewhere other than *.pypi.python.org when you've chosen pypi.python.org as your index, ISTM the proposal is failing to meet its original goals. As the PEP is written, PyPI could change out to a different CDN each week or use different ones for different files, and users would be back in the position of not being sure where stuff is coming from. I'm fine with extending the default host matching to indexhost,*.indexhost if we want to leave more of an option for PyPI and other indexes to use a CDN. But I'm not sure how much point to it there is, since a /simple index is static, and small in size compared to the downloads, so you might as well host a copy of the /simple index alongside the downloads, and make the index pypicdn.com/simple or whatever in the first place. (In other words, not a lot of benefit to splitting a static index from its associated files, so why support it?) PyPI wouldn't be enforcing a UI on you here, just providing metadata that you can use as you wish. That's not what the PEP says. It does in fact *mandate* the use of the rel attributes. So if somebody adds an external link that actually points back to PyPI, technically I'm not supposed to use it unless it's been explicitly authorized. ;-) I'd really prefer to see explicit language that says the rel information is advisory only and that installers aren't required to parse it, let alone use it. At the moment, the PEP is a substantial departure from the version I agreed with. (If there were to be any meaningful distinction in the links themselves, I would think it'd more be whether, e.g. hash information is available for the download. That's a potentially relevant distinction right now, in that PyPI automatically provides #md5 info. Even so, I'm not sure that's enough of a distinction for anyone to care about.) ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
A little off-topic, but I thought you might enjoy this in the context of all the crypto, hash and signing debate: http://xkcd.com/1181/ Cheers, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 15 2013) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ : Try our mxODBC.Connect Python Database Interface for free ! :: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
tl;dr: I see your points, we'll change the PEP to allow clients to use hostnames instead of the rel attributes if they prefer. More comments below: On 03/15/2013 12:59 PM, PJ Eby wrote: That's the bit I don't like. The security model is that if it's not allowed by allowed-hosts, it's *not allowed*. Introducing a way to sneak something past allow-hosts is a bad idea, because it means people either have to explicitly widen their allow-hosts to arbitrary hosts, or else that you can't actually enforce an allowed-hosts policy, or that you need to learn a whole bunch of options to implement it. ISTM that this is a bad design choice for users, and I'm not comfortable with this without some way to define the allowed internal hosts based in some way on the base index URL. Not just for ease of automated translation, but so that *users* can know who they're dealing with, and easily predict the effects of their chosen options. A frequent refrain has been, users don't know they're downloading stuff from places other than PyPI, so if this new approach allows downloads from somewhere other than *.pypi.python.org when you've chosen pypi.python.org as your index, ISTM the proposal is failing to meet its original goals. As the PEP is written, PyPI could change out to a different CDN each week or use different ones for different files, and users would be back in the position of not being sure where stuff is coming from. I guess the key question is the definition of places other than PyPI. I think a CDN that is part of the index's architecture is just as much part of PyPI whether it's on the same domain or not. But I understand the difficulty integrating this with the --allow-hosts option in a way that maintains a clear and simple UI. I'm fine with extending the default host matching to indexhost,*.indexhost if we want to leave more of an option for PyPI and other indexes to use a CDN. But I'm not sure how much point to it there is, since a /simple index is static, and small in size compared to the downloads, so you might as well host a copy of the /simple index alongside the downloads, and make the index pypicdn.com/simple or whatever in the first place. (In other words, not a lot of benefit to splitting a static index from its associated files, so why support it?) Putting the /simple/ API on a CDN isn't quite that easy because it currently involves some server-side redirects to effectively make project names case-insensitive. I think in a hypothetical re-architecture of PyPI there may be good security reasons to put user-uploaded files on a different domain from dynamic portions of the API (Donald alluded to this, more discussion at http://security.stackexchange.com/questions/11756/is-it-safe-to-serve-any-user-uploaded-file-under-only-white-listed-mime-content). So I think this issue may come up again in the future. But I'm fine with deferring it in this PEP for now... PyPI wouldn't be enforcing a UI on you here, just providing metadata that you can use as you wish. That's not what the PEP says. It does in fact *mandate* the use of the rel attributes. So if somebody adds an external link that actually points back to PyPI, technically I'm not supposed to use it unless it's been explicitly authorized. ;-) I'd really prefer to see explicit language that says the rel information is advisory only and that installers aren't required to parse it, let alone use it. At the moment, the PEP is a substantial departure from the version I agreed with. Ok, pending agreement from Holger I'll make a change in the PEP to explicitly allow clients to make decisions based on either the rel attributes or based on hostnames. Would that be sufficient to address your concerns? Carl signature.asc Description: OpenPGP digital signature ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On Fri, Mar 15, 2013 at 7:16 PM, Carl Meyer c...@oddbird.net wrote: Ok, pending agreement from Holger I'll make a change in the PEP to explicitly allow clients to make decisions based on either the rel attributes or based on hostnames. Would that be sufficient to address your concerns? Yes. I just don't want to be in a situation down the road where there's another argument about this on Catalog-SIG when PyPI starts using a CDN that, but it says this in the rel and you're supposed to use that, and I say, but Carl and Holger said... and they go, doesn't matter, PEP says ;-) This way, the PEP will be clear that supporting a split of PyPI's hostnames isn't in current scope. I am also okay with the PEP allowing *.indexhost instead of just indexhost as the filtering mechanism, as long as it specifies one *now*. (Again, so this doesn't have to be revisited later.) If somebody who knows something about CDNs, TUF, etc., needs to weigh in on it first, that's fine. I just want to know where things stand. Putting the /simple/ API on a CDN isn't quite that easy because it currently involves some server-side redirects to effectively make project names case-insensitive. FWIW, easy_install works fine without this. If a matching index page isn't found, it checks the full package list. PyPI's redirection just reduces bandwidth usage and request overhead in the case where the case of the user's request doesn't match the actual package listing. But it could be completely static without affecting easy_install and tools that use its package-finding code. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig
Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI
On Fri, Mar 15, 2013 at 22:01 -0400, PJ Eby wrote: On Fri, Mar 15, 2013 at 7:16 PM, Carl Meyer c...@oddbird.net wrote: Ok, pending agreement from Holger I'll make a change in the PEP to explicitly allow clients to make decisions based on either the rel attributes or based on hostnames. Would that be sufficient to address your concerns? Yes. I just don't want to be in a situation down the road where there's another argument about this on Catalog-SIG when PyPI starts using a CDN that, but it says this in the rel and you're supposed to use that, and I say, but Carl and Holger said... and they go, doesn't matter, PEP says ;-) This way, the PEP will be clear that supporting a split of PyPI's hostnames isn't in current scope. I am also okay with the PEP allowing *.indexhost instead of just indexhost as the filtering mechanism, as long as it specifies one *now*. (Again, so this doesn't have to be revisited later.) If somebody who knows something about CDNs, TUF, etc., needs to weigh in on it first, that's fine. I just want to know where things stand. One related question. The rel=internal links will contain a (md5 currently) hash so if the referenced resource resolves to a file matching that hash, we can be sure about its integrity. What kind of security does host-checking add on top? holger Putting the /simple/ API on a CDN isn't quite that easy because it currently involves some server-side redirects to effectively make project names case-insensitive. FWIW, easy_install works fine without this. If a matching index page isn't found, it checks the full package list. PyPI's redirection just reduces bandwidth usage and request overhead in the case where the case of the user's request doesn't match the actual package listing. But it could be completely static without affecting easy_install and tools that use its package-finding code. ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig ___ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig