Re: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files

2013-03-15 Thread Marcus Smith
In addition, maintainers of installation tools are asked to release
 two updates.  The first one shall provide clear warnings [...]
 The second update for installation tools should change the default
 mode to allow only installation of package files hosted at the index
 domain,


sounds good to me.


It is expected that tools in this release may choose to change the
 default index url to 
 ``https://pypi.python.org/simple/-with-ext``https://pypi.python.org/simple/-with-extin


so, *eventually*, the /simple interface (that has been transitioned to only
serve pypi links) could be deprecated?
(because new tools would be smart enough to responsibly navigate
 /simple/-with-ext)

but slightly ironic that we'd be left with an interface called
simple/-with-ext, given the goal of all this, but it makes sense.

Marcus
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


[Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread holger krekel
Hi all, in particular Philip, Marc-Andre, Donald,

Carl and me decided to simplify the PEP and avoid the somewhat
awkward ``simple/-with-externals`` index for various reasons, among them
Marc-Andre's criticisms.  This also means present-day installation tools
(shipped with Redhat/Debian/etc.) will continue to work as today for
those packages which remain in a hosting-mode that requires crawling and
scraping.  They will still benefit from the fact that most packages will
soon have a hosting-mode that avoids it.  Future releases of installation
tools will default to not perform crawling or using (scraped) external
links, and new PYPI projects will default to only serve uploaded files.

The V4 pre-PEP also renames the three PyPI hosting modes to be more
descriptive. Since all three modes allow external links, pypi-ext vs
pypi-only were misleading. The new naming distinguishes the mode that both
scrapes links from metadata and crawls external pages for more links
(pypi-scrape-crawl) from the mode that only scrapes links from metadata
(pypi-scrape) from the mode where all links are explicit (pypi-explicit).

Without the separate external index, it also turns out that the two transition
phases are separated into PyPI changes (phase one) and installer-tool
updates (phase two). There are no PyPI changes necessary in phase two.
As stated in a new open question, it should be possible to do 
PEP-related installation tool updates during phase 1, that may require
a bit of clarification in the PEP's language still.

Carl and me are happy with this PEP version now and hope you all are as
well.  Donald is already working on improving the analysis tool so
we hopefully have some updated numbers soon.

cheers,

Holger


PEP: XXX
Title: Transitioning to release-file hosting on PyPI
Version: $Revision$
Last-Modified: $Date$
Author: Holger Krekel hol...@merlinux.eu, Carl Meyer c...@oddbird.net
Discussions-To: catalog-sig@python.org
Status: Draft (PRE-submit V4)
Type: Process
Content-Type: text/x-rst
Created: 10-Mar-2013
Post-History:


Abstract


This PEP proposes a backward-compatible two-phase transition process
to speed up, simplify and robustify installing from the
pypi.python.org (PyPI) package index.  To ease the transition and
minimize client-side friction, **no changes to distutils or existing
installation tools are required in order to benefit from the first
transition phase, which will result in faster, more reliable installs
for most existing packages**.

The first transition phase implements an easy and explicit means for a
package maintainer to control which release file links are served to
present-day installation tools.  The first phase also includes the
implementation of analysis tools for present-day packages, to support
communication with package maintainers and the automated setting of
default modes for controlling release file links.  The first phase
also will make new projects on PYPI use a default to only serve 
links to release files which were uploaded to PYPI.

The second transition phase concerns end-user installation tools,
which shall default to only install release files that are hosted on
PyPI and tell the user if external release files exist, offering
a choice to automatically use those external files.


Rationale
=

.. _history:

History and motivations for external hosting


When PyPI went online, it offered release registration but had no
facility to host release files itself.  When hosting was added, no
automated downloading tool existed yet.  When Philip Eby implemented
automated downloading (through setuptools), he made the choice to
allow people to use download hosts of their choice.  The finding of
externally-hosted packages was implemented as follows:

#. The PyPI ``simple/`` index for a package contains all links found
   by scraping them from that package's long_description metadata for 
   any release. Links in the Download-URL and Home-page metadata
   fields are given ``rel=download`` and ``rel=homepage`` attributes,
   respectively.

#. Any of these links whose target is a file whose name appears to be
   in the form of an installable source or binary distribution, with
   name in the form packagename-version.ARCHIVEEXT, is considered a
   potential installation candidate by installation tools.

#. Similarly, any links suffixed with an #egg=packagename-version
   fragment are considered an installation candidate.

#. Additionally, the ``rel=homepage`` and ``rel=download`` links are
   crawled by installation tools and, if HTML, are themselves scraped
   for release-file links in the above formats.

Today, most packages released on PyPI host their release files on
PyPI, but a small percentage (XXX need updated data) rely on external
hosting.

There are many reasons [2]_ why people have chosen external
hosting. To cite just a few:

- release processes and scripts have been developed already and upload
  to external sites

- 

Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread PJ Eby
Do we even need the internal/external rel info?  I was planning to
just use the URL hostname.

i.e., are there any use cases for designating an externally-hosted
file internal, or an internally-hosted file external?  If not, it
seems the rel= is redundant.

It's also more work to implement, vs. just defaulting --allow-hosts to
be the --index-url host; a strategy ISTM pip could also use, since it
has the same two options available.

Also, if we're not doing homepage/download crawling any more, I was
hoping we could just drop the code that 'parses' rel= links in the
first place, as it's an awkward ugly hack.  ;-)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread Donald Stufft
On Mar 15, 2013, at 11:15 AM, PJ Eby p...@telecommunity.com wrote:

 Do we even need the internal/external rel info?  I was planning to
 just use the URL hostname.
 
 i.e., are there any use cases for designating an externally-hosted
 file internal, or an internally-hosted file external?  If not, it
 seems the rel= is redundant.
 
 It's also more work to implement, vs. just defaulting --allow-hosts to
 be the --index-url host; a strategy ISTM pip could also use, since it
 has the same two options available.
 
 Also, if we're not doing homepage/download crawling any more, I was
 hoping we could just drop the code that 'parses' rel= links in the
 first place, as it's an awkward ugly hack.  ;-)
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig

It makes things uglier for end users if you have packages and the simple index 
hosted on several sites. It also just adds extra information so if 
setuptools/easy_install wants to just use the host case that wouldn't be bad.

It's actually more defensible to keep the service (ala PyPI/simple index) and 
the user uploaded content (ala distribution files) hosted on separate domains 
as it makes things like gifar style attacks harder to execute. Making a move 
like that would break mirroring ATM on PyPI but it's good information to 
include on the simple index to make it simpler for tools to determine what 
links are internal and what are external. 

FWIW Crate has the uploaded files on an external domain for just this reason. 
(Also for CDN reasons but that's because a SSL CDN is ).


-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread holger krekel
On Fri, Mar 15, 2013 at 11:15 -0400, PJ Eby wrote:
 Do we even need the internal/external rel info?  I was planning to
 just use the URL hostname.
 
 i.e., are there any use cases for designating an externally-hosted
 file internal, or an internally-hosted file external?  If not, it
 seems the rel= is redundant.
 
 It's also more work to implement, vs. just defaulting --allow-hosts to
 be the --index-url host; a strategy ISTM pip could also use, since it
 has the same two options available.
 
 Also, if we're not doing homepage/download crawling any more, I was
 hoping we could just drop the code that 'parses' rel= links in the
 first place, as it's an awkward ugly hack.  ;-)

We wanted to avoid requiring hostname-checking especially in light of
parallel developments putting PYPI release files on a CDN, i.e.  non
pypi.python.org domains.  The rel=internal communicates that this link
is under control of the index server and the installer should not be
worried and users need not know about allow-hosts etc.  For example,
Donald's https://crate.io is already operating in this manner and has
its files on crate-cdn.com.

best,
holger


___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread Carl Meyer
On 03/15/2013 09:15 AM, PJ Eby wrote:
 Do we even need the internal/external rel info?  I was planning to
 just use the URL hostname.
 
 i.e., are there any use cases for designating an externally-hosted
 file internal, or an internally-hosted file external?  If not, it
 seems the rel= is redundant.

Right; Donald and Holger already gave the rationale for this: there are
good reasons for an index to not have internal links actually on the
exact same hostname. Even just using a different subdomain would break
simple host comparison.

 It's also more work to implement, vs. just defaulting --allow-hosts to
 be the --index-url host; a strategy ISTM pip could also use, since it
 has the same two options available.

Pip actually doesn't currently have --allow-hosts, although there's no
good reason for that; it ought to.

 Also, if we're not doing homepage/download crawling any more, I was
 hoping we could just drop the code that 'parses' rel= links in the
 first place, as it's an awkward ugly hack.  ;-)

Well, parsing HTML links as an API is an ugly hack, but within that
existing framework rel seems like the appropriate semantic attribute
for this type of information, not really upping the hackiness quotient :-)

Carl



signature.asc
Description: OpenPGP digital signature
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files

2013-03-15 Thread Carl Meyer
Hi Marcus,

On 03/15/2013 01:32 AM, Marcus Smith wrote:
 
 
 In addition, maintainers of installation tools are asked to release
 two updates.  The first one shall provide clear warnings [...]
 The second update for installation tools should change the default
 mode to allow only installation of package files hosted at the index
 domain, 
 
 
 sounds good to me.

Excellent, having the installer-tool maintainers on-board is obviously
important here :-)

 It is expected that tools in this release may choose to change the
 default index url to ``https://pypi.python.org/simple/-with-ext``
 https://pypi.python.org/simple/-with-ext in
 
 
 so, *eventually*, the /simple interface (that has been transitioned to
 only serve pypi links) could be deprecated?
 (because new tools would be smart enough to responsibly navigate
  /simple/-with-ext)
 
 but slightly ironic that we'd be left with an interface called
 simple/-with-ext, given the goal of all this, but it makes sense.

Right, it was precisely this awkwardness (the likelihood that tools
would want to default to -with-ext and use host-comparison to
distinguish internal/external, so as to provide info about external
links with a single request-response) that led us to eliminate the
separate indexes in our latest V4 draft and use rel attributes to
distinguish link types.

Carl



signature.asc
Description: OpenPGP digital signature
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread PJ Eby
On Fri, Mar 15, 2013 at 12:07 PM, Carl Meyer c...@oddbird.net wrote:
 On 03/15/2013 09:15 AM, PJ Eby wrote:
 Do we even need the internal/external rel info?  I was planning to
 just use the URL hostname.

 i.e., are there any use cases for designating an externally-hosted
 file internal, or an internally-hosted file external?  If not, it
 seems the rel= is redundant.

 Right; Donald and Holger already gave the rationale for this: there are
 good reasons for an index to not have internal links actually on the
 exact same hostname. Even just using a different subdomain would break
 simple host comparison.

 It's also more work to implement, vs. just defaulting --allow-hosts to
 be the --index-url host; a strategy ISTM pip could also use, since it
 has the same two options available.

 Pip actually doesn't currently have --allow-hosts, although there's no
 good reason for that; it ought to.

 Also, if we're not doing homepage/download crawling any more, I was
 hoping we could just drop the code that 'parses' rel= links in the
 first place, as it's an awkward ugly hack.  ;-)

 Well, parsing HTML links as an API is an ugly hack, but within that
 existing framework rel seems like the appropriate semantic attribute
 for this type of information, not really upping the hackiness quotient :-)

Well, to be clear, I liked previous versions of the proposal better
than this one.  But while I *really* don't want to do any new rel
parsing, that's not the only or even the most important reason.

The main reason is that I think internal vs. external is a bogus
distinction: what's important (IMO) is what hosts you do and don't
trust.  Giving a blanket pass to all external links doesn't seem like
such a good idea to me, nor does allowing the index to define what
hosts the client should trust.   As for the internal ones, I'm not
sure why we can't at least make a subdomain requirement, or have users
explicitly add a PyPI CDN to their configured --allow-hosts.

To try to put it another way: there should be one, and preferably only
one, obvious way to specify where you get downloads from.  That way in
easy_install is currently --allow-hosts.  Adding new options that
interact and overlap with that looks like bad UI design to me,
increasing the possibility of user confusion.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread Donald Stufft

On Mar 15, 2013, at 12:51 PM, PJ Eby p...@telecommunity.com wrote:

 On Fri, Mar 15, 2013 at 12:07 PM, Carl Meyer c...@oddbird.net wrote:
 On 03/15/2013 09:15 AM, PJ Eby wrote:
 Do we even need the internal/external rel info?  I was planning to
 just use the URL hostname.
 
 i.e., are there any use cases for designating an externally-hosted
 file internal, or an internally-hosted file external?  If not, it
 seems the rel= is redundant.
 
 Right; Donald and Holger already gave the rationale for this: there are
 good reasons for an index to not have internal links actually on the
 exact same hostname. Even just using a different subdomain would break
 simple host comparison.
 
 It's also more work to implement, vs. just defaulting --allow-hosts to
 be the --index-url host; a strategy ISTM pip could also use, since it
 has the same two options available.
 
 Pip actually doesn't currently have --allow-hosts, although there's no
 good reason for that; it ought to.
 
 Also, if we're not doing homepage/download crawling any more, I was
 hoping we could just drop the code that 'parses' rel= links in the
 first place, as it's an awkward ugly hack.  ;-)
 
 Well, parsing HTML links as an API is an ugly hack, but within that
 existing framework rel seems like the appropriate semantic attribute
 for this type of information, not really upping the hackiness quotient :-)
 
 Well, to be clear, I liked previous versions of the proposal better
 than this one.  But while I *really* don't want to do any new rel
 parsing, that's not the only or even the most important reason.
 
 The main reason is that I think internal vs. external is a bogus
 distinction: what's important (IMO) is what hosts you do and don't
 trust.  Giving a blanket pass to all external links doesn't seem like
 such a good idea to me, nor does allowing the index to define what
 hosts the client should trust.   As for the internal ones, I'm not
 sure why we can't at least make a subdomain requirement, or have users
 explicitly add a PyPI CDN to their configured --allow-hosts.
 
 To try to put it another way: there should be one, and preferably only
 one, obvious way to specify where you get downloads from.  That way in
 easy_install is currently --allow-hosts.  Adding new options that
 interact and overlap with that looks like bad UI design to me,
 increasing the possibility of user confusion.
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig


You can do that fwiw. That's fine. You can optionally just use the internal 
links as a indicator about which hosts should automatically be added to the 
a--allow-hosts for a particular index.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread Carl Meyer
On 03/15/2013 10:51 AM, PJ Eby wrote:
 Giving a blanket pass to all external links doesn't seem like
 such a good idea to me, 

This is a very good point, and it should be made clearer in the PEP that
we don't recommend a single blanket option to allow all external links,
but an option (like allow-hosts) that lets you specify with more
granularity which external links to use. I think perhaps rel=external
confuses this point; the real purpose of the rel tags is just so that
rel=internal can be considered part of the index.

FWIW I think it would be just as reasonable UI for a hypothetical tool
to let you say I want to trust external links for the Foo project
rather than I want to trust external links to djangoproject.com and
avoid host-comparison altogether. IOW, I don't think hostname is
inherently a better or safer indicator of trust than project name;
hosts can change ownership at least as easily and silently as PyPI
projects! So I don't think the PEP should require all installer tools to
choose trust-by-hostname (which would be implied by removing the rel tags).

 nor does allowing the index to define what
 hosts the client should trust.   

I'm not sure about this. By using an index at all, you are trusting that
index to provide whatever level of
reliability/stability/security/whatever you expect from it. Allowing the
index itself to specify that it keeps its files on a different host in a
way that is transparent to the user seems like a natural extension of
this trust that doesn't harm anything and aids usability greatly. (Cases
where the index is lying to you definitely fall outside the scope of
what this PEP is aiming to help with.)

As for the internal ones, I'm not
 sure why we can't at least make a subdomain requirement, or have users
 explicitly add a PyPI CDN to their configured --allow-hosts.

Even a subdomain requirement can make a CDN more difficult/expensive to
implement. And once you go beyond simple host-equality comparisons and
into subdomain-equivalence I'm wary of the added implementation
complexity we're asking of every installer tool, and the potential for
subtle differences in implementation. This seems to me like a worse can
of worms than rel-parsing.

 To try to put it another way: there should be one, and preferably only
 one, obvious way to specify where you get downloads from.  That way in
 easy_install is currently --allow-hosts.  Adding new options that
 interact and overlap with that looks like bad UI design to me,
 increasing the possibility of user confusion.

Like Donald says, I don't see any problem with you choosing to keep
allow-hosts as the only user-facing option for easy_install. It would be
up to you whether you also want to use rel=internal as a hint for
implicitly (perhaps with warning) adding to --allow-hosts, to allow
better compatibility with indexes that use a different host for
file-hosting (it's possible that even PyPI itself may move into this
category, I haven't been following the CDN discussions carefully).

PyPI wouldn't be enforcing a UI on you here, just providing metadata
that you can use as you wish. I do think the internal/external
distinction is meaningful and unambiguous metadata that the index is
able to provide, and there's no reason for the index to withhold it.
(That distinction is not new in this version of the PEP, either, it's
just made via rel tags now instead of via a separate index.)

Carl



signature.asc
Description: OpenPGP digital signature
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread M.-A. Lemburg
Thanks, Holger. This version looks a lot better :-)

There are still some minor quirks which would need to be
addressed more explicitly, but overall, this proposal provides
a good way forward.

Perhaps it would also be possible to add the secured download
links and the caching/proxying ideas to the PEP at some point,
or we turn those into a new PEP.

I can't follow up in detail today, but will have a closer look
next week.

On 15.03.2013 10:29, holger krekel wrote:
 Hi all, in particular Philip, Marc-Andre, Donald,
 
 Carl and me decided to simplify the PEP and avoid the somewhat
 awkward ``simple/-with-externals`` index for various reasons, among them
 Marc-Andre's criticisms.  This also means present-day installation tools
 (shipped with Redhat/Debian/etc.) will continue to work as today for
 those packages which remain in a hosting-mode that requires crawling and
 scraping.  They will still benefit from the fact that most packages will
 soon have a hosting-mode that avoids it.  Future releases of installation
 tools will default to not perform crawling or using (scraped) external
 links, and new PYPI projects will default to only serve uploaded files.
 
 The V4 pre-PEP also renames the three PyPI hosting modes to be more
 descriptive. Since all three modes allow external links, pypi-ext vs
 pypi-only were misleading. The new naming distinguishes the mode that both
 scrapes links from metadata and crawls external pages for more links
 (pypi-scrape-crawl) from the mode that only scrapes links from metadata
 (pypi-scrape) from the mode where all links are explicit (pypi-explicit).
 
 Without the separate external index, it also turns out that the two transition
 phases are separated into PyPI changes (phase one) and installer-tool
 updates (phase two). There are no PyPI changes necessary in phase two.
 As stated in a new open question, it should be possible to do 
 PEP-related installation tool updates during phase 1, that may require
 a bit of clarification in the PEP's language still.
 
 Carl and me are happy with this PEP version now and hope you all are as
 well.  Donald is already working on improving the analysis tool so
 we hopefully have some updated numbers soon.
 
 cheers,
 
 Holger
 
 
 PEP: XXX
 Title: Transitioning to release-file hosting on PyPI
 Version: $Revision$
 Last-Modified: $Date$
 Author: Holger Krekel hol...@merlinux.eu, Carl Meyer c...@oddbird.net
 Discussions-To: catalog-sig@python.org
 Status: Draft (PRE-submit V4)
 Type: Process
 Content-Type: text/x-rst
 Created: 10-Mar-2013
 Post-History:
 
 
 Abstract
 
 
 This PEP proposes a backward-compatible two-phase transition process
 to speed up, simplify and robustify installing from the
 pypi.python.org (PyPI) package index.  To ease the transition and
 minimize client-side friction, **no changes to distutils or existing
 installation tools are required in order to benefit from the first
 transition phase, which will result in faster, more reliable installs
 for most existing packages**.
 
 The first transition phase implements an easy and explicit means for a
 package maintainer to control which release file links are served to
 present-day installation tools.  The first phase also includes the
 implementation of analysis tools for present-day packages, to support
 communication with package maintainers and the automated setting of
 default modes for controlling release file links.  The first phase
 also will make new projects on PYPI use a default to only serve 
 links to release files which were uploaded to PYPI.
 
 The second transition phase concerns end-user installation tools,
 which shall default to only install release files that are hosted on
 PyPI and tell the user if external release files exist, offering
 a choice to automatically use those external files.
 
 
 Rationale
 =
 
 .. _history:
 
 History and motivations for external hosting
 
 
 When PyPI went online, it offered release registration but had no
 facility to host release files itself.  When hosting was added, no
 automated downloading tool existed yet.  When Philip Eby implemented
 automated downloading (through setuptools), he made the choice to
 allow people to use download hosts of their choice.  The finding of
 externally-hosted packages was implemented as follows:
 
 #. The PyPI ``simple/`` index for a package contains all links found
by scraping them from that package's long_description metadata for 
any release. Links in the Download-URL and Home-page metadata
fields are given ``rel=download`` and ``rel=homepage`` attributes,
respectively.
 
 #. Any of these links whose target is a file whose name appears to be
in the form of an installable source or binary distribution, with
name in the form packagename-version.ARCHIVEEXT, is considered a
potential installation candidate by installation tools.
 
 #. Similarly, any links suffixed with an #egg=packagename-version
   

Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread PJ Eby
On Fri, Mar 15, 2013 at 1:39 PM, Carl Meyer c...@oddbird.net wrote:
 up to you whether you also want to use rel=internal as a hint for
 implicitly (perhaps with warning) adding to --allow-hosts,

That's the bit I don't like.  The security model is that if it's not
allowed by allowed-hosts, it's *not allowed*.  Introducing a way to
sneak something past allow-hosts is a bad idea, because it means
people either have to explicitly widen their allow-hosts to arbitrary
hosts, or else that you can't actually enforce an allowed-hosts
policy, or that you need to learn a whole bunch of options to
implement it.

ISTM that this is a bad design choice for users, and I'm not
comfortable with this without some way to define the allowed
internal hosts based in some way on the base index URL.  Not just
for ease of automated translation, but so that *users* can know who
they're dealing with, and easily predict the effects of their chosen
options.

A frequent refrain has been, users don't know they're downloading
stuff from places other than PyPI, so if this new approach allows
downloads from somewhere other than *.pypi.python.org when you've
chosen pypi.python.org as your index, ISTM the proposal is failing to
meet its original goals.  As the PEP is written, PyPI could change out
to a different CDN each week or use different ones for different
files, and users would be back in the position of not being sure where
stuff is coming from.

I'm fine with extending the default host matching to
indexhost,*.indexhost if we want to leave more of an option for PyPI
and other indexes to use a CDN.  But I'm not sure how much point to it
there is, since a /simple index is static, and small in size compared
to the downloads, so you might as well host a copy of the /simple
index alongside the downloads, and make the index pypicdn.com/simple
or whatever in the first place.  (In other words, not a lot of benefit
to splitting a static index from its associated files, so why support
it?)


 PyPI wouldn't be enforcing a UI on you here, just providing metadata
 that you can use as you wish.

That's not what the PEP says.  It does in fact *mandate* the use of
the rel attributes.  So if somebody adds an external link that
actually points back to PyPI, technically I'm not supposed to use it
unless it's been explicitly authorized.  ;-)

I'd really prefer to see explicit language that says the rel
information is advisory only and that installers aren't required to
parse it, let alone use it.  At the moment, the PEP is a substantial
departure from the version I agreed with.

(If there were to be any meaningful distinction in the links
themselves, I would think it'd more be whether, e.g. hash information
is available for the download.  That's a potentially relevant
distinction right now, in that PyPI automatically provides #md5 info.
Even so, I'm not sure that's enough of a distinction for anyone to
care about.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread M.-A. Lemburg
A little off-topic, but I thought you might enjoy this in the
context of all the crypto, hash and signing debate:

http://xkcd.com/1181/

Cheers,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 15 2013)
 Python Projects, Consulting and Support ...   http://www.egenix.com/
 mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


: Try our mxODBC.Connect Python Database Interface for free ! ::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread Carl Meyer
tl;dr: I see your points, we'll change the PEP to allow clients to use
hostnames instead of the rel attributes if they prefer. More comments below:

On 03/15/2013 12:59 PM, PJ Eby wrote:
 That's the bit I don't like.  The security model is that if it's not
 allowed by allowed-hosts, it's *not allowed*.  Introducing a way to
 sneak something past allow-hosts is a bad idea, because it means
 people either have to explicitly widen their allow-hosts to arbitrary
 hosts, or else that you can't actually enforce an allowed-hosts
 policy, or that you need to learn a whole bunch of options to
 implement it.
 
 ISTM that this is a bad design choice for users, and I'm not
 comfortable with this without some way to define the allowed
 internal hosts based in some way on the base index URL.  Not just
 for ease of automated translation, but so that *users* can know who
 they're dealing with, and easily predict the effects of their chosen
 options.
 
 A frequent refrain has been, users don't know they're downloading
 stuff from places other than PyPI, so if this new approach allows
 downloads from somewhere other than *.pypi.python.org when you've
 chosen pypi.python.org as your index, ISTM the proposal is failing to
 meet its original goals.  As the PEP is written, PyPI could change out
 to a different CDN each week or use different ones for different
 files, and users would be back in the position of not being sure where
 stuff is coming from.

I guess the key question is the definition of places other than PyPI.
I think a CDN that is part of the index's architecture is just as much
part of PyPI whether it's on the same domain or not. But I understand
the difficulty integrating this with the --allow-hosts option in a way
that maintains a clear and simple UI.

 I'm fine with extending the default host matching to
 indexhost,*.indexhost if we want to leave more of an option for PyPI
 and other indexes to use a CDN.  But I'm not sure how much point to it
 there is, since a /simple index is static, and small in size compared
 to the downloads, so you might as well host a copy of the /simple
 index alongside the downloads, and make the index pypicdn.com/simple
 or whatever in the first place.  (In other words, not a lot of benefit
 to splitting a static index from its associated files, so why support
 it?)

Putting the /simple/ API on a CDN isn't quite that easy because it
currently involves some server-side redirects to effectively make
project names case-insensitive. I think in a hypothetical
re-architecture of PyPI there may be good security reasons to put
user-uploaded files on a different domain from dynamic portions of the
API (Donald alluded to this, more discussion at
http://security.stackexchange.com/questions/11756/is-it-safe-to-serve-any-user-uploaded-file-under-only-white-listed-mime-content).

So I think this issue may come up again in the future. But I'm fine with
deferring it in this PEP for now...

 PyPI wouldn't be enforcing a UI on you here, just providing metadata
 that you can use as you wish.
 
 That's not what the PEP says.  It does in fact *mandate* the use of
 the rel attributes.  So if somebody adds an external link that
 actually points back to PyPI, technically I'm not supposed to use it
 unless it's been explicitly authorized.  ;-)
 
 I'd really prefer to see explicit language that says the rel
 information is advisory only and that installers aren't required to
 parse it, let alone use it.  At the moment, the PEP is a substantial
 departure from the version I agreed with.

Ok, pending agreement from Holger I'll make a change in the PEP to
explicitly allow clients to make decisions based on either the rel
attributes or based on hostnames. Would that be sufficient to address
your concerns?

Carl



signature.asc
Description: OpenPGP digital signature
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread PJ Eby
On Fri, Mar 15, 2013 at 7:16 PM, Carl Meyer c...@oddbird.net wrote:
 Ok, pending agreement from Holger I'll make a change in the PEP to
 explicitly allow clients to make decisions based on either the rel
 attributes or based on hostnames. Would that be sufficient to address
 your concerns?

Yes.  I just don't want to be in a situation down the road where
there's another argument about this on Catalog-SIG when PyPI starts
using a CDN that, but it says this in the rel and you're supposed to
use that, and I say, but Carl and Holger said...  and they go,
doesn't matter, PEP says   ;-)

This way, the PEP will be clear that supporting a split of PyPI's
hostnames isn't in current scope.

I am also okay with the PEP allowing *.indexhost instead of just
indexhost as the filtering mechanism, as long as it specifies one
*now*.  (Again, so this doesn't have to be revisited later.)  If
somebody who knows something about CDNs, TUF, etc., needs to weigh in
on it first, that's fine.  I just want to know where things stand.


 Putting the /simple/ API on a CDN isn't quite that easy because it
 currently involves some server-side redirects to effectively make
 project names case-insensitive.

FWIW, easy_install works fine without this.  If a matching index page
isn't found, it checks the full package list.  PyPI's redirection just
reduces bandwidth usage and request overhead in the case where the
case of the user's request doesn't match the actual package listing.
But it could be completely static without affecting easy_install and
tools that use its package-finding code.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread holger krekel
On Fri, Mar 15, 2013 at 22:01 -0400, PJ Eby wrote:
 On Fri, Mar 15, 2013 at 7:16 PM, Carl Meyer c...@oddbird.net wrote:
  Ok, pending agreement from Holger I'll make a change in the PEP to
  explicitly allow clients to make decisions based on either the rel
  attributes or based on hostnames. Would that be sufficient to address
  your concerns?
 
 Yes.  I just don't want to be in a situation down the road where
 there's another argument about this on Catalog-SIG when PyPI starts
 using a CDN that, but it says this in the rel and you're supposed to
 use that, and I say, but Carl and Holger said...  and they go,
 doesn't matter, PEP says   ;-)
 
 This way, the PEP will be clear that supporting a split of PyPI's
 hostnames isn't in current scope.

 
 I am also okay with the PEP allowing *.indexhost instead of just
 indexhost as the filtering mechanism, as long as it specifies one
 *now*.  (Again, so this doesn't have to be revisited later.)  If
 somebody who knows something about CDNs, TUF, etc., needs to weigh in
 on it first, that's fine.  I just want to know where things stand.
 
One related question.  The rel=internal links will contain
a (md5 currently) hash so if the referenced resource resolves to
a file matching that hash, we can be sure about its integrity.
What kind of security does host-checking add on top?

holger

  Putting the /simple/ API on a CDN isn't quite that easy because it
  currently involves some server-side redirects to effectively make
  project names case-insensitive.
 
 FWIW, easy_install works fine without this.  If a matching index page
 isn't found, it checks the full package list.  PyPI's redirection just
 reduces bandwidth usage and request overhead in the case where the
 case of the user's request doesn't match the actual package listing.
 But it could be completely static without affecting easy_install and
 tools that use its package-finding code.
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig