Re: [Catalog-sig] How to determine if archive is an sdist or bdist

2013-03-31 Thread PJ Eby
On Sun, Mar 31, 2013 at 6:13 PM, James Carpenter nawk...@gmail.com wrote:
 Do you have a module/function/line number in easy_install I should use? I'm
 sure I can dig it out myself but it sounds like you might just be able to
 put your finger on it in only a minute or two.

It's the install_eggs() method of
setuptools.commands.easy_install.easy_install.  You won't really be
able to use it, it just looks for a setup.py after *unpacking* the
archive.  It also doesn't look for a PKG-INFO; PyPI does that.  (And I
only know that because it was relevant to the uploadability of eggs at
one time.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] How to determine if archive is an sdist or bdist

2013-03-29 Thread PJ Eby
On Fri, Mar 29, 2013 at 11:00 AM, James Carpenter nawk...@gmail.com wrote:
 Looks like the idea of using a custom command is a better approach then.

I'm not sure why you think that.  The only kinds of archives whose
file types are ambiguous from the name, are sdist, bdist_dumb, and
random raw source dumps.  Everything else has a unique extension like
.egg, .exe, .msi, rpm, etc.  If you have a .zip, .tar.gz, .tgz, or
some other archive name, you can find out if it's an sdist by
inspecting its contents as I described.  And if it's not an sdist, you
can usually tell if it's a raw source dump by checking for a setup.py
in the archive root or a depth-1 subdirectory off the root.  (That's
what easy_install does, anyway, when it's given an archive it doesn't
know what to do with.)


 Is a custom command my only choice or can I register pre/post hooks to any
 given command?


 On Thu, Mar 28, 2013 at 3:36 PM, PJ Eby p...@telecommunity.com wrote:

 On Thu, Mar 28, 2013 at 3:57 PM, James Carpenter nawk...@gmail.com
 wrote:
  Is there an easy way to programmatically tell if an archive (tar.gz,
  zip,
  etc.) in the dist directory is a binary or sdist? I would like to
  post-process the contents of a dist directory and classify each build
  artifact there (egg, sdist, bdist, etc.).

 An sdist always has a single subdirectory in the archive's root
 directory, named for the package+version, and containing a PKG-INFO
 and setup.py (plus a bunch of other stuff).

 A bdist_dumb will not have such a subdirectory in the archive root;
 instead it will have one or more directories like /usr, /opt, /Program
 Files.

 Other bdist formats?  Hard to say.


___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Merge catalog-sig and distutils-sig

2013-03-28 Thread PJ Eby
On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake f...@fdrake.net wrote:
 On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft don...@stufft.io wrote:
 Is there much point in keeping catalog-sig and distutils-sig separate?

 No.

 The last time this was brought up, there were objections, but I don't
 remember what they were.  I'll let people who think there's a point
 worry about that.

 Not sure if there's some official process for requesting it or not, but
 I think we should merge the two lists and just make packaging-sig to
 umbrella the entire packaging topics.

 There is the meta-sig, but the description is out-dated:

 http://mail.python.org/mailman/listinfo/meta-sig

 and the last message in the archives is dated 2011, and sparked no
 discussion:

 http://mail.python.org/pipermail/meta-sig/2011-June.txt

 +1 on merging the lists.

Can we do it by just dropping catalog-sig and keeping distutils-sig?
I'm afraid we might lose some important distutils-sig population if
the process involves renaming the list, resubscribing, etc.  I also
*really* don't want to invalidate archive links to the distutils-sig
archive.

All in all, +1 on not having two lists, but I'm really worried about
breaking distutils-sig.  We're still going to be talking about
distribution utilities, after all.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Merge catalog-sig and distutils-sig

2013-03-28 Thread PJ Eby
On Thu, Mar 28, 2013 at 3:43 PM, Donald Stufft don...@stufft.io wrote:
 On Mar 28, 2013, at 3:39 PM, PJ Eby p...@telecommunity.com wrote:
 Can we do it by just dropping catalog-sig and keeping distutils-sig?
 I'm afraid we might lose some important distutils-sig population if
 the process involves renaming the list, resubscribing, etc.  I also
 *really* don't want to invalidate archive links to the distutils-sig
 archive.

 All in all, +1 on not having two lists, but I'm really worried about
 breaking distutils-sig.  We're still going to be talking about
 distribution utilities, after all.

 Worst case I'm sure subscribers can be transferred and the existing archive 
 kept intact.

That's a great way to have a bunch of people complaining that they
never subscribed to packaging-sig, not to mention the part where it
breaks everyone's mail filters.

I really don't see any gains for renaming the list.  It's not like we
can go and scrub the entire internet of references to distutils-sig.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] How to determine if archive is an sdist or bdist

2013-03-28 Thread PJ Eby
On Thu, Mar 28, 2013 at 3:57 PM, James Carpenter nawk...@gmail.com wrote:
 Is there an easy way to programmatically tell if an archive (tar.gz, zip,
 etc.) in the dist directory is a binary or sdist? I would like to
 post-process the contents of a dist directory and classify each build
 artifact there (egg, sdist, bdist, etc.).

An sdist always has a single subdirectory in the archive's root
directory, named for the package+version, and containing a PKG-INFO
and setup.py (plus a bunch of other stuff).

A bdist_dumb will not have such a subdirectory in the archive root;
instead it will have one or more directories like /usr, /opt, /Program
Files.

Other bdist formats?  Hard to say.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Merge catalog-sig and distutils-sig

2013-03-28 Thread PJ Eby
On Thu, Mar 28, 2013 at 5:15 PM, Jacob Kaplan-Moss ja...@jacobian.org wrote:
 C'mon, folks, we're arguing about a name. That's about as close to
 literal bikeshedding as we could get.

I'm not arguing about the *name*.  I just don't see the point in
making everybody subscribe to a new list and change their mail filters
(and update every book and webpage out there that mentions the
distutils-sig), because a few people want to *change* the name -- a
change that AFAICT doesn't actually provide any tangible benefit to
anybody whatsoever.


 How about we just let whoever has the keys make the change in whatever way's 
 easiest and most logical for them?

Because it's not up to just the person with the keys.  Neither SIG is
a mere mailing list, it's a Python special interest group, and SIGs
have their own formation and termination processes.

In particular, if you're going to start a new SIG, one of the
requirements to be met is in particular, no other SIG nor the general
Python newsgroup is already more suitable (per the Python SIG
Creation Guidelines).  It's hard to argue that distutils-sig isn't
already more suitable than whatever is being proposed to take its
place.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Access to Windows' cert store

2013-03-21 Thread PJ Eby
On Thu, Mar 21, 2013 at 8:06 AM, Christian Heimes christ...@python.org wrote:
 Hi,

 the message is slightly off-topic but it might be interesting for pip,
 setuptools and other developers that are working on HTTPS for PyPI.

 I while ago I found C++ example code that shows how to dump CA and CRL
 certs from Windows's system cert store. The system cert store contains
 the certificates used by Windows, IE etc.

 Yesterday I reimplemented the C++ code with Python and ctypes. I have
 tested it with Python 2.6 to 3.3 (x86 and x86_64) on Windows 7. It
 should work with Windows XP / Windows Server 2003 and all newer versions
 of Windows. The output is usabl by Python's SSL module but you have to
 dump the certs to a file first.

 I'm planing to add the feature to Python 3.4, too.
 http://bugs.python.org/issue17134

 You can download the code from

   https://bitbucket.org/tiran/wincertstore


Very nice!  I definitely would like to use this for setuptools, but I
actually want it for versions 2.3-2.5, which can't use requests or
urllib3 or anything like that.  So I hacked on the code a bit and got
it to work (or at least got the __main__ stub to spit out a bunch of
data) with Python 2.3 and ctypes 1.0.2 (the last standalone release
for which Windows binaries are available).  Would you like a patch?

(Note: absolute_import, decorators, and the actual use of with: and
generator expressions had to go, but this doesn't change any API or
semantics as far as I can tell, just a bit of appearance here and
there, and the code still runs with 2.4, 2.5, 2.7, 3.1, and 3.2 that I
tried.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-18 Thread PJ Eby
On Sat, Mar 16, 2013 at 3:15 AM, Nick Coghlan ncogh...@gmail.com wrote:

 On 15 Mar 2013 16:16, Carl Meyer c...@oddbird.net wrote:

 tl;dr: I see your points, we'll change the PEP to allow clients to use
 hostnames instead of the rel attributes if they prefer.

 I will veto any such change. Clients MUST NOT assume that the architecture
 of the index service will be limited to a single host name, they must
 process the explicit metadata provided by the index that indicates which
 hosts the index controls.

 Adding a --trust-indices flag to make this optional in setuptools would be
 fine, but it seems perverse to trust every aspect of an index *except* its
 claims to control additional hosts.

Actually, setuptools trusts redirects, so that mechanism is available
for splitting the hosted files to another domain.

As it stands, though, I don't see a way to support this without
introducing confusion.  The advantage of using allow-hosts based on
the index host is that it *also* specifies what to do with dependency
links provided by individual packages; the PEP does not provide any
real guidance on this point.

So, I have to withdraw my support for the PEP with these recent
changes, as it no longer reflects the approach I previously agreed to,
and as yet there have been no alternatives proposed to address the
user confusion issues (which IMO at least are a big part of the point
of having the PEP).

Of course, if redirection is required for non-extrapolatable
hostnames, or if somebody comes up with a new and brilliant scheme to
manage the menage of permissions needed across dependency_links, the
index, and general host trusting issues (while remaining
comprehensible and predictable to end users), I'll certainly have a
look again.  But I took the weekend off from this discussion to try to
come up with one myself, and so far I've got nothing.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-18 Thread PJ Eby
On Mon, Mar 18, 2013 at 1:22 PM, PJ Eby p...@telecommunity.com wrote:
 Actually, setuptools trusts redirects, so that mechanism is available
 for splitting the hosted files to another domain.

 As it stands, though, I don't see a way to support this without
 introducing confusion.

Oops - that wasn't clear.  By this I meant the current version of the PEP.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread PJ Eby
Do we even need the internal/external rel info?  I was planning to
just use the URL hostname.

i.e., are there any use cases for designating an externally-hosted
file internal, or an internally-hosted file external?  If not, it
seems the rel= is redundant.

It's also more work to implement, vs. just defaulting --allow-hosts to
be the --index-url host; a strategy ISTM pip could also use, since it
has the same two options available.

Also, if we're not doing homepage/download crawling any more, I was
hoping we could just drop the code that 'parses' rel= links in the
first place, as it's an awkward ugly hack.  ;-)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread PJ Eby
On Fri, Mar 15, 2013 at 12:07 PM, Carl Meyer c...@oddbird.net wrote:
 On 03/15/2013 09:15 AM, PJ Eby wrote:
 Do we even need the internal/external rel info?  I was planning to
 just use the URL hostname.

 i.e., are there any use cases for designating an externally-hosted
 file internal, or an internally-hosted file external?  If not, it
 seems the rel= is redundant.

 Right; Donald and Holger already gave the rationale for this: there are
 good reasons for an index to not have internal links actually on the
 exact same hostname. Even just using a different subdomain would break
 simple host comparison.

 It's also more work to implement, vs. just defaulting --allow-hosts to
 be the --index-url host; a strategy ISTM pip could also use, since it
 has the same two options available.

 Pip actually doesn't currently have --allow-hosts, although there's no
 good reason for that; it ought to.

 Also, if we're not doing homepage/download crawling any more, I was
 hoping we could just drop the code that 'parses' rel= links in the
 first place, as it's an awkward ugly hack.  ;-)

 Well, parsing HTML links as an API is an ugly hack, but within that
 existing framework rel seems like the appropriate semantic attribute
 for this type of information, not really upping the hackiness quotient :-)

Well, to be clear, I liked previous versions of the proposal better
than this one.  But while I *really* don't want to do any new rel
parsing, that's not the only or even the most important reason.

The main reason is that I think internal vs. external is a bogus
distinction: what's important (IMO) is what hosts you do and don't
trust.  Giving a blanket pass to all external links doesn't seem like
such a good idea to me, nor does allowing the index to define what
hosts the client should trust.   As for the internal ones, I'm not
sure why we can't at least make a subdomain requirement, or have users
explicitly add a PyPI CDN to their configured --allow-hosts.

To try to put it another way: there should be one, and preferably only
one, obvious way to specify where you get downloads from.  That way in
easy_install is currently --allow-hosts.  Adding new options that
interact and overlap with that looks like bad UI design to me,
increasing the possibility of user confusion.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread PJ Eby
On Fri, Mar 15, 2013 at 1:39 PM, Carl Meyer c...@oddbird.net wrote:
 up to you whether you also want to use rel=internal as a hint for
 implicitly (perhaps with warning) adding to --allow-hosts,

That's the bit I don't like.  The security model is that if it's not
allowed by allowed-hosts, it's *not allowed*.  Introducing a way to
sneak something past allow-hosts is a bad idea, because it means
people either have to explicitly widen their allow-hosts to arbitrary
hosts, or else that you can't actually enforce an allowed-hosts
policy, or that you need to learn a whole bunch of options to
implement it.

ISTM that this is a bad design choice for users, and I'm not
comfortable with this without some way to define the allowed
internal hosts based in some way on the base index URL.  Not just
for ease of automated translation, but so that *users* can know who
they're dealing with, and easily predict the effects of their chosen
options.

A frequent refrain has been, users don't know they're downloading
stuff from places other than PyPI, so if this new approach allows
downloads from somewhere other than *.pypi.python.org when you've
chosen pypi.python.org as your index, ISTM the proposal is failing to
meet its original goals.  As the PEP is written, PyPI could change out
to a different CDN each week or use different ones for different
files, and users would be back in the position of not being sure where
stuff is coming from.

I'm fine with extending the default host matching to
indexhost,*.indexhost if we want to leave more of an option for PyPI
and other indexes to use a CDN.  But I'm not sure how much point to it
there is, since a /simple index is static, and small in size compared
to the downloads, so you might as well host a copy of the /simple
index alongside the downloads, and make the index pypicdn.com/simple
or whatever in the first place.  (In other words, not a lot of benefit
to splitting a static index from its associated files, so why support
it?)


 PyPI wouldn't be enforcing a UI on you here, just providing metadata
 that you can use as you wish.

That's not what the PEP says.  It does in fact *mandate* the use of
the rel attributes.  So if somebody adds an external link that
actually points back to PyPI, technically I'm not supposed to use it
unless it's been explicitly authorized.  ;-)

I'd really prefer to see explicit language that says the rel
information is advisory only and that installers aren't required to
parse it, let alone use it.  At the moment, the PEP is a substantial
departure from the version I agreed with.

(If there were to be any meaningful distinction in the links
themselves, I would think it'd more be whether, e.g. hash information
is available for the download.  That's a potentially relevant
distinction right now, in that PyPI automatically provides #md5 info.
Even so, I'm not sure that's enough of a distinction for anyone to
care about.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread PJ Eby
On Fri, Mar 15, 2013 at 7:16 PM, Carl Meyer c...@oddbird.net wrote:
 Ok, pending agreement from Holger I'll make a change in the PEP to
 explicitly allow clients to make decisions based on either the rel
 attributes or based on hostnames. Would that be sufficient to address
 your concerns?

Yes.  I just don't want to be in a situation down the road where
there's another argument about this on Catalog-SIG when PyPI starts
using a CDN that, but it says this in the rel and you're supposed to
use that, and I say, but Carl and Holger said...  and they go,
doesn't matter, PEP says   ;-)

This way, the PEP will be clear that supporting a split of PyPI's
hostnames isn't in current scope.

I am also okay with the PEP allowing *.indexhost instead of just
indexhost as the filtering mechanism, as long as it specifies one
*now*.  (Again, so this doesn't have to be revisited later.)  If
somebody who knows something about CDNs, TUF, etc., needs to weigh in
on it first, that's fine.  I just want to know where things stand.


 Putting the /simple/ API on a CDN isn't quite that easy because it
 currently involves some server-side redirects to effectively make
 project names case-insensitive.

FWIW, easy_install works fine without this.  If a matching index page
isn't found, it checks the full package list.  PyPI's redirection just
reduces bandwidth usage and request overhead in the case where the
case of the user's request doesn't match the actual package listing.
But it could be completely static without affecting easy_install and
tools that use its package-finding code.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm

2013-03-14 Thread PJ Eby
On Thu, Mar 14, 2013 at 6:07 AM, M.-A. Lemburg m...@egenix.com wrote:
 On 12.03.2013 22:26, PJ Eby wrote:
 On Tue, Mar 12, 2013 at 3:59 PM, M.-A. Lemburg m...@egenix.com wrote:
 On 12.03.2013 19:15, M.-A. Lemburg wrote:
 I've run into a weird issue with easy_install, that I'm trying to solve:

 If I place two files named

 egenix_mxodbc_connect_client-2.0.2-py2.6.egg
 egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip

 into the same directory and let easy_install running on Linux
 scan this, it considers the second file for Windows as best
 match.

 Is the algorithm used for determining the best match documented
 somewhere ?

 I've had a look at the implementation, but this left me rather
 clueless.

 I thought that setuptools would prefer the .egg file over
 the prebuilt .zip file - binary files being easier to install
 than source files.

 After some experiments, I found that the follow change
 in filename (swapping platform and python version, in addition
 to use '-' instead of '.) works:

 egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip

 OTOH, this one doesn't (notice the difference ?):

 egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip

 The logic behind all this looks rather fragile to me.

 easy_install only guarantees sane version parsing for distribution
 files built using setuptools' naming algorithms.  If you use
 distutils, it can only make guesses, because the distutils does not
 have a completely unambiguous file naming scheme.  And if you are
 naming the files by hand, God help you.  ;-)

 The problem appears to be a bug in setuptools' package_index.py.

 The function interpret_distro_name() creates a set of possible
 separations of the found name into project name and version.

 It does find the right separation, but for some reason, the
 code using that function does not check the found project
 names against the project name the user is trying to install,
 but simply takes the last entry of the list returned by the
 above function.

 As a result, easy_install downloads and tries to install
 project files that don't match the project name in some
 cases.

 Here's another example where it fails (say you're on a x64 Linux box):

 # easy_install egenix-pyopenssl

 As example, say it finds these distribution files:

 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-linux-x86_64-prebuilt.zip',
 'egenix_pyopenssl-0.13.1.1.0.1.5-py2.7-linux-x86_64.egg',
 
 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs2-macosx-10.5-x86_64-prebuilt.zip',
 
 'egenix-pyopenssl-0.13.1.1.0.1.5-py2.7_ucs4-macosx-10.5-x86_64-prebuilt.zip',

 It then creates different interpretations of those names, puts
 them in a list and sorts them. Here's the end of that list:

 egenix-pyopenssl; 0.13.1.1.0.1.5 -- this would be the correct .egg file
 egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-linux-x86-64-prebuilt
 egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs2-macosx-10.5-x86-64-prebuilt
 egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt
 egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs2-macosx; 10.5-x86-64-prebuilt
 egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx; 10.5-x86-64-prebuilt

 It picks the last entry, which would be for a project called
 egenix-pyopenssl-0.13.1.1.0.1.5-py2.7-ucs4-macosx - not the one
 the user searched.

Actually, that's not quite true.  It's picking:

egenix-pyopenssl; 0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt

Because it thinks that
'0.13.1.1.0.1.5-py2.7-ucs4-macosx-10.5-x86-64-prebuilt' is a higher
version than 0.13.1.1.0.1.5.

It does also record the possibility you mentioned, but it doesn't pick
that one.  The project names actually *do* have to match.

If you open a ticket on the setuptools tracker, 'll try to see if I
can get it to recognize that strings like py2.7, macosx, ucs, and the
like are terminators for a version number.  I don't know how
successful I'll be, though.  Basically, those zip files are (I assume)
bdist_dumb distributions being taken for source distributions, and
easy_install doesn't actually support bdist_dumb files at the moment.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm

2013-03-14 Thread PJ Eby
On Thu, Mar 14, 2013 at 2:11 PM, M.-A. Lemburg m...@egenix.com wrote:
 Is there any way to have 0.13.1.1.0.1.5-something sort before
 0.13.1.1.0.1.5 ? (e.g. like is done for release candidates)

Make it 0.13.1.1.0.1.5-devsomething, and it'll have lower
precedence than both 0.13.1.1.0.1.5 and
0.13.1.1.0.1.5-something.

 If you could point me to that tracker, I'll open a ticket :-)

http://bugs.python.org/setuptools/
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files

2013-03-13 Thread PJ Eby
On Wed, Mar 13, 2013 at 7:21 AM, holger krekel hol...@merlinux.eu wrote:
 Hi all,

 after some more discussions and hours spend by Carl Meyer (who is now
 co-authoring the PEP) and me, here is a new V3 pre-submit draft.
 It is now more ambitious than the previous draft as should be obvious
 from the modified abstract (and Carl Meyers and Philip's earlier
 interactions on this list).  There also are more details of how
 the current link-scraping works among other improvements and incorporations
 of feedback from discussions here.

 We intend to submit this draft tonight to the PEP editors.

 Feedback now and later remains welcome.  I am sure there are issues to
 be sorted and clarified, among them the versioning-API suggestion by
 Marc-Andre.

 Thanks for everybody's support and feedback so far,
 holger

Looks good to me!

Setuptools' two releases will probably look like this:

1. Default to externals index, warn when fetching URLs that are not
the same host as the index
2. Default to externals index, reject URLs that are not the same host
as the index unless --allow-hosts is configured  (IOW, default
allow-hosts to equal index-url host)

That way, external URLs can still be discovered by the user, but the
default configuration is still secure.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] A 90% Solution

2013-03-12 Thread PJ Eby
On Tue, Mar 12, 2013 at 5:50 AM, M.-A. Lemburg m...@egenix.com wrote:
 Not hard to do: we'd just need to keep the old index in place
 using a different URL, e.g. /simple-v1/.

That's not necessary: the XML-RPC API lets you query those URLs
directly.  They're part of the metadata standard, after all...  which
means you can *also* access them by downloading the DOAP records,
browsing the PyPI pages directly, etc.

There are plenty of ways to get that data, no point adding another one.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-12 Thread PJ Eby
On Tue, Mar 12, 2013 at 1:25 AM, Lennart Regebro rege...@gmail.com wrote:
 Externally hosted files are a real world actual problem.

You're leaving out some important words from that sentence.  Words
like, for some people and who choose to depend on projects using
them.

PyPI isn't your private personal playground.  Other people have rights, too.

 This discussion has since a long time gone past reason into pure stop energy.

I agree - hardly anyone is giving any reasoning that justifies why one
group of people should have their projects censored to benefit a few
blowhards on Catalog-SIG.

Carl's the only person who's even *tried* giving a justification.
Everyone else just shuts up or changes the subject when I ask that
question.

I'll ask it again: why should *thousands* of projects be censored or
made to change their release processes, because *you* can't be
bothered to cache the distributions of the projects you depend on?

Not, why would it be a good idea for them to change anyway.

Why should they be *forced* to do it?

Bonus points: answer why, *every time* somebody proposes a way of
improving things that doesn't *ban* external hosting, you guys go all
stop energy on that and derail the discussion with why it has to be
total.

AFAICT, you're the ones stopping things moving forward here,
filibustering against every possible compromise.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI

2013-03-12 Thread PJ Eby
On Tue, Mar 12, 2013 at 7:38 AM, holger krekel hol...@merlinux.eu wrote:
 In addition, maintainers of installation tools are asked to release
 two updates.  The first one shall provide clear warnings if external
 crawling needs to happen,

A clarification here: needs to happen is not well-specified.  An
installer tasked with finding the latest or best-matching version of a
package must currently *always* crawl.  So the warning would be
always.

The strategy I originally chose for making this change in easy_install
is to warn once at the beginning that --allow-hosts has not been set,
and thus packages might be downloaded from anywhere on the internet.

I've since become uncertain that this change is actually workable in
the short term, since until most of the packages are actually moved
onto PyPI, a lot of installs will fail if somebody changes their
configuration to be more secure.  So I'm thinking the warning needs to
be deferred until at least the more popular packages have moved to
PyPI.


 Now, if there is some agreement, i can submit this PEP officially tomorrow,
 and given agreement/refinments from the Pycon folks and the likes of
 Richard, we may be able to get going very shortly after Pycon.

I'd like to suggest that the PEP should be explicit that no other
changes to the /simple generation algorithm are being made, just the
removal or alteration of rel= attributes.  i.e., it will still be
possible -- at least in the near term -- for projects to include
explicit download links to files made available elsewhere.  Changing
that situation is more controversial and will require wider community
participation than has occurred to date.

It might also be good to suggest that authors of PyPI clones plan
their own phase-out of rel= attributes.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-12 Thread PJ Eby
On Tue, Mar 12, 2013 at 12:29 PM, Jacob Kaplan-Moss ja...@jacobian.org wrote:
 On Tue, Mar 12, 2013 at 11:19 AM, M.-A. Lemburg m...@egenix.com wrote:
 So let's do this carefully and find a good solution before
 jumping to conclusions.

 Completely agreed; rushing is a bad idea.

 But so is not starting. What I'm seeing — as a total outsider, a user
 of these tools, not someone who creates them — is that a bunch of
 people (Holger, Donald, Richard, the pip maintainers, etc.) have the
 beginnings of a solution ready to go *right now*, and I want to
 capture that energy and enthusiasm before it evaporates.

 This isn't an academic situation; I've seen companies decline to adopt
 Python over this exact security issue.

Nobody told them about how to configure a restricted, site-wide
default --allow-hosts setting?   (
http://peak.telecommunity.com/DevCenter/EasyInstall#restricting-downloads-with-allow-hosts
and 
http://docs.python.org/2/install/index.html#location-and-names-of-config-files
)

(FWIW, --allow-hosts was added in setuptools 0.6a6 -- *years* before
the distribute fork or the existence of pip, and pip offers the same
option.)

I've already agreed to change setuptools to default this option to
only allow downloads from the same host as its index URL, in a future
release.  (i.e. to default --allow-hosts to the host of the
--index-url option), and I support the removing of rel= spidering
from PyPI (which will significantly mitigate the immediate speed and
security issues).  Heck, I've been the one who'se repeatedly proposed
various ways of cutting back or removing rel= attributes from the
/simple index.

The result of these two changes will actually have the same net effect
that people are being asking for here: you'll only be able to download
stuff hosted on PyPI, unless you explicitly override the --allow-hosts
to get a wider range of packages.

Already today, when a URL is blocked by --allow-hosts, it's announced
as part of easy_install's output, so you can see exactly how much
wider you need to extend your trust for the download to succeed.

The *only* thing I object to is removing the ability for people to
*choose* their own levels of trust.

And I have not yet seen an argument that justifies removing people's
ability to *choose* to be more inclusive in their downloads.

And I've put multiple compromise proposals out there to begin
mitigating the problem *now* (i.e. for non-updated versions of
setuptools), and every time, the objection is, no, we need to ban it
all now, no discussion, no re-evaluation, no personal choice, everyone
must do as we say, no argument.

And I don't understand that, at all.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-12 Thread PJ Eby
On Tue, Mar 12, 2013 at 1:33 PM, Jesse Noller jnol...@gmail.com wrote:
 There's not much to understand: external hosting of packages is *actively 
 harmful*, period. End users of easy_install and pip *don't even realize* 99% 
 of the time that these tools are following links off of PyPi and installing 
 packages from random, probably insecure/non https locations all over the 
 internet. Once they realize it they recoil in terror if they have any 
 understanding of the implications.

This is a rationale for secure defaults for various options, like the
ones I outlined in the portions of my post that you *didn't* quote.

It's not a rationale for removing the options themselves.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-12 Thread PJ Eby
On Tue, Mar 12, 2013 at 2:18 PM, Carl Meyer c...@oddbird.net wrote:
 It seems to me that there's a remarkable level of consensus developing
 here (though it may not look like it), and a small set of remaining open
 questions.

 The consensus (as I see it):

 - Migrate away from scraping external HTML pages, with package owners in
 control of the migration but a deadline for a forced switch, as outlined
 in Holger's PEP (with all appropriate caution and testing).

 - In some way, migrate to a situation where the popular installer tools
 install only release files from PyPI by default, but are capable of
 installing from other locations if the user provides an option.

Perhaps I'm confused, but ISTM that every time I've said this, Donald
and Lennart argue that it should not be possible to provide such an
option -- or to be more specific, that PyPI should not publish the
information that makes that option possible.

If that's *not* the position they're taking, it'd be good to know,
because we could totally stop arguing about it in that case.


 A) Leave external links in the PyPI simple index, but migrate the major
 tools to not use external links by default (i.e. Philip's plan to make
 allow-hosts=pypi the default in a future setuptools), with an option to
 turn them back on.

I don't know who has proposed this option, but it's not me.  You seem
to be confusing external links and HTML-scraped links (rel=
attributed links in /simple).

I was the first person to propose disabling HTML-scraped links from
PyPI *ASAP*.  I still want them gone.  That won't require tool
changes, it just requires a rollout plan.  Holger has one, let's work
on that.

The second thing I proposed is that new tools be developed to *assist*
package authors in moving their files onto PyPI, so that future tool
changes wouldn't result in widespread instances of people needing to
set their tools to insecure settings just to get anything done.  We
need to get people's files moving onto PyPI *first*, in order to make
changing the tool defaults practical.

The *only* thing I object to is the part where some people want to ban
external links from /simple, always and forever, regardless of the
package authors' choice in the matter.


 B) Do a second PyPI migration, again with a per-package toggle and
 package owners in control, to a no external links in simple index setting.

 Consider for a moment how similar the end state here is with either A or
 B. In either case, by default users install only from PyPI, but by
 providing a special option they can install from some external source.
 (In B, that special option would be something like --find-links with a
 URL). In either case, we can continue to allow packages to register
 themselves on PyPI, be found in searches, etc, without uploading release
 files to PyPI if they prefer not to; they'll just have to provide
 special installation instructions to their users in that case.

Not true: approach B means that you won't know what values to pass to
the option.

It's also confused about an important point.  All the links that
appear in /simple are *already* completely under the package author's
control.  No new switches are required to remove external links - you
can simply remove them from your releases' descriptions.  This process
could be made more transparent or easy, sure -- but it's a mistake to
say that this is granting the package owners control that they don't
already have.

What they lack control over is the rel= attributes, short of
removing those links entirely.  That's why I've proposed having a
switch for that , as reflected in Holger's pre-PEP.


 1) With B, we can provide a gentler migration for package owners, where
 they are in control of when the switch happens.

 2) With B, all end users benefit from the new defaults, not only end
 users who update to the latest and greatest tools.

 3) With B (and probably some forms of A as well), end users clearly
 state which external sources they would like to trust and install from,
 rather than having a global trust everything! flag, which is less
 secure and less sensible.

These 3 statements all mischaracterize things substantially, because
none of those benefits are exclusive to A, and nobody has proposed a
trust everything flag.  Removing rel= attributes also benefits
everyone right away, *without* new tools.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-12 Thread PJ Eby
On Tue, Mar 12, 2013 at 2:43 PM, Robert Collins
robe...@robertcollins.net wrote:
 This takes an age when each new web host to talk to is a new DNS
 lookup (say 0.3 seconds) + HTTP request (0.6 seconds) with possible
 HTTPS setup in there too (up to 1.2 seconds). A project with dozens of
 dependencies in it's transitive dependency graph may take minutes
 *just spidering*.

Which is why we should act on Holger's pre-PEP to drop the rel=
attributes from projects that don't actually use them -- builds
involving those projects will immediately drop to one HTTP request to
PyPI, plus one to whatever host has the actually needed file.

And that's without any tooling changes whatsoever: builds all over the
planet will just get faster and more secure, right away.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI

2013-03-12 Thread PJ Eby
On Tue, Mar 12, 2013 at 2:07 PM, M.-A. Lemburg m...@egenix.com wrote:
 Just a quick note (more later, if time permits)...

 On 12.03.2013 18:05, holger krekel wrote:
 Hi Marc-Andre, all,

 - Prepare PYPI implementation to allow a per-project hosting mode,
   effectively enabling or disabling external crawling.  When enabled
   nothing changes from the current situation of producing ``rel=download``
   and ``rel=homepage`` attributed links on ``simple/`` pages,
   causing installers to crawl those sites.
   When disabled, the attributions of links will change
   to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to
   avoid crawling 3rd party sites.  Retaining the meta-information allows
   tools to still make use of the semantic information.

 Please start using versioned APIs for these things. The
 old style index should still be available under some
 URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/

 Not sure it is neccessary in this case.  I would think it makes
 the implementation harder and it would probably break PEP381 (mirroring
 infrastructure) as well.

 Here's what I meant:

 We publish the current implementation of the /simple/ index API
 under a new URL /simple-v1/, so that people that want to use
 the old API can continue to do so.

Do you know of anyone who's *actually* going to need/use this
alternate API.  Why can't they just the XML-RPC API, the DOAP API, or
any other means of obtaining this information?

Heck, the proposal to just change the value of the rel attribute isn't
going to stop anybody from using that data.  Please let's not
complicate this by adding more API formats for PyPI to support..
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-12 Thread PJ Eby
On Tue, Mar 12, 2013 at 3:36 PM, Jacob Kaplan-Moss ja...@jacobian.org wrote:
 On Tue, Mar 12, 2013 at 2:21 PM, PJ Eby p...@telecommunity.com wrote:
 The *only* thing I object to is the part where some people want to ban
 external links from /simple, always and forever, regardless of the
 package authors' choice in the matter.

 Here's the thing though, there are already a bunch of other ways users
 can install packages from external repositories. I can think of at
 least two:

 * I can pip/easy_install a given URL (e.g. easy_install
 https://www.djangoproject.com/download/1.5/tarball/)
 * I can use a custom index server (pip install -i http://localserver/ django)

 The important part is that in each of those cases I can see clearly
 where I'm getting things from.



 From where I stand the absolutely non-negotiable part is that
 `pip/easy_install/whatever package` should NEVER access an external
 host (after some suitable transition period). This needs to include
 older installer software, and it needs to make it hard for new tools
 to do the wrong thing. How this is achieved really doesn't matter to
 me -- if there's a pip install --insecure Django that's fine too --
 but to me it's non-negotiable that the out-of-the-box configuration
 not allow external hosts.

I'm confused by this statement.  never access an external host is
not consistent with have the option to specify what hosts you trust,
while still keeping PyPI as a universal index of Python software.


 Yes, this means taking some options away from the package creator. It
 means that when I'm wearing my author-of-Django hat I can't choose to
 list Django on PyPI but provide the download elsewhere. That's not
 perfect, but given a creator choice vs out of the box security
 choice the latter has to win. [And as a package creator I still have
 options: I can run my own package server, fairly easy to do these
 days.]

 Again, the *how* isn't a big deal to me, but the result is really
 important: the tooling has to be secure-by-default, and that means
 (among other things) `pip install package` can never hit something
 that's not PyPI without me explicitly asking for it.

That part's fine.  As I've said repeatedly, though, it's the removing
other links from the /simple index entirely that's the problem.

Under what I've proposed, as soon as the tools are updated to
secure-default (and the situation *now* if you set your --allow-hosts
to PyPI-only), is that easy_install will announce what URLs it is
skipping because they're not on PyPI.  (pip too, IIUC.)

I can't tell you how to configure pip for this, but if you want to
configure easy_install to be secure right *now*, add:

[easy_install]
allow_hosts=pypi.python.org

to your user-level or site-wide distutils .cfg file.

Better yet, encourage other people to add it now, find out what they
can no longer install, and talk to their upstream providers about
moving to PyPI.

This is all good.

I'm just saying, we don't need to change PyPI to do anything but drop
the rel= links, and change the tools to default allow-hosts to equal
index-url.  (pip has the same parameters, not sure what config files
it uses, though.  I don't think it inherits [easy_install] settings,
though.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-12 Thread PJ Eby
On Tue, Mar 12, 2013 at 4:14 PM, Carl Meyer c...@oddbird.net wrote:
 You say below that nobody has proposed a 'trust everything' flag. If
 there is no trust everything flag, then it seems to me that with
 either option A or option B the user needs to specify what they intend
 to trust. I.e. if you make the default value of allow-hosts the index
 url host, as you said you plan to do at some point, users would need to
 override it with the hosts they want to allow.

 It seems like maybe what you are wanting is automatically-discoverable
 installation from externally-hosted files? I.e. that I could say
 easy_install Foo --allow-external, without needing to know any
 specific external url for Foo?

 This is what I was characterizing as a trust everything flag, but on
 reflection I don't think I have any problem with that.

Here's a story to illustrate what I mean:

Joe wants to install foo.  He runs easy_install Foo.  Foo is hosted
externally to PyPI, so easy_install says:

URL foo.com/downloads/foo-1.2.tgz BLOCKED by allow-hosts option --
install failed.

(Or words to that effect; I'd have to check the source to get you the
exact phrasing).

The point is, Joe now *knows where to get foo from*, because PyPI
still had the information.  Joe can now decide whether he wants to
download it manually and inspect it first, expand his allow-hosts
option, or give Foo a pass.

The proposals that call for banning all links from the /simple index,
prevent Joe from being able to do this at all.


 This is partly true. An explicit flag grants package owners more control
 in that right now they don't have a choice about whether external links
 to tarballs in their long_description automatically get sucked into the
 simple index. This is not hypothetical; even if there were no rel-link
 scraping, I've had cases where package owners have complained to me
 about pip installing an RC tarball they had linked directly from their
 long-description, not intending it to be auto-installable.

Fair enough.  Thank you for actually providing an illustration of a
problem.  There's been far too much handwaving of problems without any
explicit description of what the problem *is*.

I would support making references to external links explicit rather
than implicit.


 I think it would be preferable if in the future package owners wouldn't
 need to be careful what release-file links they might place in their
 long_description, and release files would be only explicitly nominated.

Ok.


 I think the current automatically suck in links to simple/ behavior is
 only useful as a backwards-compatibility hack, which is why I think an
 explicit switch to disable it (on by default for newly-registered
 projects, slowly, gently, carefully migrated to on for existing
 projects) is better than keeping this link-scraping behavior
 indefinitely for all projects and asking package owners to clean up
 their long-descriptions.

I would agree with dropping link parsing from the description field,
provided that an alternative way is provided for projects to
explicitly add external links to /simple, concurrent with the other
changes.

Thank you for taking the time to engage and re-engage on this issue,
and to Explain It Like I'm Five for me, with an illustration of an
actual problematic use case.  ;-)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] setuptools/distribute/easy_install/pkg_resource sorting algorithm

2013-03-12 Thread PJ Eby
On Tue, Mar 12, 2013 at 3:59 PM, M.-A. Lemburg m...@egenix.com wrote:
 On 12.03.2013 19:15, M.-A. Lemburg wrote:
 I've run into a weird issue with easy_install, that I'm trying to solve:

 If I place two files named

 egenix_mxodbc_connect_client-2.0.2-py2.6.egg
 egenix-mxodbc-connect-client-2.0.2.win32-py2.6.prebuilt.zip

 into the same directory and let easy_install running on Linux
 scan this, it considers the second file for Windows as best
 match.

 Is the algorithm used for determining the best match documented
 somewhere ?

 I've had a look at the implementation, but this left me rather
 clueless.

 I thought that setuptools would prefer the .egg file over
 the prebuilt .zip file - binary files being easier to install
 than source files.

 After some experiments, I found that the follow change
 in filename (swapping platform and python version, in addition
 to use '-' instead of '.) works:

 egenix-mxodbc-connect-client-2.0.2-py2.6-win32.prebuilt.zip

 OTOH, this one doesn't (notice the difference ?):

 egenix-mxodbc-connect-client-2.0.2.py2.6-win32.prebuilt.zip

 The logic behind all this looks rather fragile to me.

easy_install only guarantees sane version parsing for distribution
files built using setuptools' naming algorithms.  If you use
distutils, it can only make guesses, because the distutils does not
have a completely unambiguous file naming scheme.  And if you are
naming the files by hand, God help you.  ;-)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-11 Thread PJ Eby
On Sun, Mar 10, 2013 at 8:25 PM, Donald Stufft don...@stufft.io wrote:
 I don't think anyone is bad here, nor am I arguing against any particular 
 person or group of people. I'm arguing against a practice and a system. 
 You're going out of your way to find excuses to throw all sorts of stop 
 energy here.

Calling a legitimate disagreement with your point of view stop
energy seems inappropriate to me, since my issue is with you
derailing the topic of how to get people to *voluntarily* migrate to a
better situation than the present one, and to develop tools for that
process.  The only thing I wish you to stop is the repeated assertion
without proof that 1) external links must go *and* 2) this must be an
enforced directive rather than a (highly-encouraged) option.

I have even gone so far as to suggest, earlier in this thread, what
evidence I would find at least suggestive of your POV.  But your
response to that and prior challenges to those assertions, has been
simply to move your goalpost.  E.g. from current uptime is bad to
any uptime lower than PyPI's is totally unacceptable.

I, on the other hand, have moved in the direction of *your* proposals
repeatedly, making adjustments as I find actually-convincing evidence
and/or reasoning, or find ways to deal with the issues.  I have
compromised quite a bit.  (And have already spent a fair amount of
time writing setuptools code to lay a foundation for these changes.)

You, as far as I can tell, have not moved your position in the slightest.

Which of these is stop energy?

It is not the case that external links must be removed from PyPI in
order to ensure security, or uptime.  And it is *especially* not the
case that you are the BDFL of uptime.  You're definitely not the BDFL
of uptime for any given project hosted on PyPI, that you *voluntarily
choose* to make a part of your build process.  If your primary
argument is that project X must host its files on PyPI because of your
build process, then I think you misunderstand open source, and also
the part where you *chose* to make it part of your build process.  It
certainly doesn't give you the right to force projects Y, Z, and Q --
that you don't even use! -- to also host their projects on PyPI,
because project X -- the one you do use -- has a slow or unreliable
file host!

It seems disingenuous to then shfit the argument back to security when
challenged on uptime, and back to uptime when challenged on security.
We've looped back and forth over those for some time: when I point out
that wheels have signatures which will make off-site hosting
relatively unimportant to the security picture, you jump back to
talking about uptime.  When I point out that uptime is a consensual
factor that in no way justifies legislating what other people can do
with their projects, you go back to talking about security.

Make up your mind.  What problem are you actually trying to solve?

(I expect your response on wheels to be that wheels aren't there yet,
etc., but that isn't actually a response to the objection unless
you're going to change your position to, okay, external links to file
formats that can be signed can stay, or something of that sort.
Otherwise, you're not actually compromising, just using the fact that
wheels aren't in common use yet as an argument to keep the position
you started with.)


 My analogy served only to put into light that the system that I'm trying to 
 change is insecure, just like allowing anyone to walk into a bank vault and 
 pick up money would be insecure. I fully believe that the people using such a 
 system are completely trustworthy people. But just because *they* are 
 trustworthy doesn't mean that a system which allows *anyone* to attack other 
 Python developers is *ok*.

And my analogy served only to put into light the part where you're
insisting that one group of people change for the benefit of a group
which is already benefiting from their pre-existing generosity.

That being said, I do see that I could have misinterpreted the intent
of your analogy -- it sounded like you were saying that the developers
who host off-PyPI were thieves walking into your bank and taking your
money (i.e., analogizing theft with inconveniencing you by making your
builds fail or run slowly).

Though to be honest, I still don't comprehend how else to make any
kind of sense to that analogy in its original context.  Who is the
bank?  Whose money is being taken?  The whole thing is utterly
confusing to me if I try to take it any other way than the way I did,
because it doesn't seem to have any other simple 1:1 mapping to the
situation, as far as I can see.   Your explanation seems terribly
abstract and tortured to me, as far as analogies go.


 When discussing security of a system it's necessary to divorce yourself from 
 the implementations of things. When you get wrapped up in the implementation 
 you turn things into a Us vs Them game (as evidenced by several of your 
 messages) instead of discussing the 

Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-11 Thread PJ Eby
On Mon, Mar 11, 2013 at 7:14 AM, Donald Stufft don...@stufft.io wrote:
 1) Proof of what? That it's insecure? That it harms uptime? That it violates 
 people's privacy?

That any of those things apply to anybody who *isn't using those packages*.

Without this, you are only providing a reason to encourage people to
change, not to force them to do so.


 2) Even a single project remaining causes the entire thing to cascade

Cascade *how*?  Please explain.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-11 Thread PJ Eby
On Mon, Mar 11, 2013 at 12:45 PM, Lennart Regebro rege...@gmail.com wrote:
 On Mon, Mar 11, 2013 at 5:12 PM, PJ Eby p...@telecommunity.com wrote:
 On Mon, Mar 11, 2013 at 7:14 AM, Donald Stufft don...@stufft.io wrote:
 1) Proof of what? That it's insecure? That it harms uptime? That it 
 violates people's privacy?

 That any of those things apply to anybody who *isn't using those packages*.

 If nobody is using the packages, it does indeed harm no-one.

Then there is no reason to ban them.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-11 Thread PJ Eby
On Mon, Mar 11, 2013 at 1:45 PM, Lennart Regebro rege...@gmail.com wrote:
 So, we should not remove the links for external packages until
 somebody traverses those links? But as soon as somebody asks for those
 links, we should remove them? In fact before we give them the link?

I'm saying that if someone objects to the presence of  links they
don't actually use, they are speaking nonsense.  Might as well ask to
ban all packages from PyPI that they don't personally like -- it's the
same request.  Nobody is forcing you to depend on packages that don't
host on PyPI, so there is no point to the censorship.

If you don't use the links, you can't argue that their presence is
causing you harm.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-11 Thread PJ Eby
On Mon, Mar 11, 2013 at 4:07 PM, Carl Meyer c...@oddbird.net wrote:
 On 03/11/2013 01:57 PM, PJ Eby wrote:
 I'm saying that if someone objects to the presence of  links they
 don't actually use, they are speaking nonsense.  Might as well ask to
 ban all packages from PyPI that they don't personally like -- it's the
 same request.  Nobody is forcing you to depend on packages that don't
 host on PyPI, so there is no point to the censorship.

 If you don't use the links, you can't argue that their presence is
 causing you harm.

 You can, of course, argue that the mere presence of those links
 (combined with the current behavior of easy_install/pip) is an
 attractive nuisance that indirectly causes harm to unsuspecting new
 users of Python who never even consider the possibility that tools like
 easy_install and pip might spider off PyPI to arbitrary websites

Which is why I think removing rel= spidering is a good idea.  In
fact, I'm the one who suggested that.  I also suggested moving to
turning it off by default in future versions of easy_install, adding
warnings, etc.

But that's not the same thing as agreeing that it should be *banned*
for people to publish machine-readable download information on PyPI
for a file that's hosted off-PyPI.  ISTM that Python's consenting
adults standard sets a higher bar for banning a feature than it does
for marking it, here there be dragons and offering a better
alternative.  Heck, even in Python the language, the mere removal of a
feature in a new version of Python, doesn't stop people from
continuing to use the old one.  Here we're talking about
infrastructure that everybody uses; it's not like there's a PyPI X.1
that people can keep using if X.2 comes out.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


[Catalog-sig] A 90% Solution

2013-03-11 Thread PJ Eby
Just a thought, but...

If 90% of PyPI projects do not have any external files to download,
then, wouldn't it make sense to:

1. Add a project-level option to enable or disable the adding of the
rel= attribute to /simple links (but not affecting the links in any
other way)
2. Default it to disabled for new projects, and
3. Set it to disabled *now* for the 90% of projects that *don't have
external files*?

If the arguments about banning external links are as valid and
important as some people claim, wouldn't it make sense to do this part
*now*, without first requiring a commitment to force the switch to a
disabled state in the future?

Immediately, 90% of the problem goes away - no random spidering of
stuff that doesn't contain a link now, but which could be taken over
by a malicious party in the future, and 90% fewer sites having to be
up in order for you to build something from PyPI.

Seems like a serious win to me -- and one that might not even need a PEP.

Next steps after this would be providing tools to help people move
their files and links, promoting that people switch it off if they no
longer support the offsite links, educating about security concerns,
etc.

I really don't understand why the 90% solution isn't *already* the
consensus position, since it doesn't preclude follow-on efforts
towards reducing the 10% towards 0%.

And if the problem is so important, why must we keep 90% of the
problems in place, just so we can keep arguing about censoring the
10%?  That doesn't make sense to me.

To me, if somebody's injured, the first thing you do is clean and
close the wound, not argue about whether it's a complete solution and
what might happen days or weeks later.

Just a thought.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] A 90% Solution

2013-03-11 Thread PJ Eby
On Mon, Mar 11, 2013 at 7:39 PM, Donald Stufft don...@stufft.io wrote:

 On Mar 11, 2013, at 7:04 PM, PJ Eby p...@telecommunity.com wrote:

 Just a thought, but...

 If 90% of PyPI projects do not have any external files to download,
 then, wouldn't it make sense to:

 To be accurate it's 90% don't have any files/release available *only* 
 externally. Most have external  files to download because it's very rare that 
 a project doesn't include an home_page or a download_url, especially since 
 distutils complains if you don't.

So what is the % of projects for whom the option can be disabled
automatically, *without* disabling automated downloadability of a
project's externally hosted files?

Your statement is confusing to me, because the having of a home page
or download URL doesn't have anything to do with whether that page has
any files to download from it.

I am saying that if a project has no *downloadable* files (not web
pages) whose links can only be found by spidering, then we can turn
off the rel attribute.

How many projects do not have any download links listed on their
rel=-linked pages?


 1. Add a project-level option to enable or disable the adding of the
 rel= attribute to /simple links (but not affecting the links in any
 other way)
 2. Default it to disabled for new projects, and
 3. Set it to disabled *now* for the 90% of projects that *don't have
 external files*?

 +1 except 1. should be to remove the links entirely from the /simple/
 index, not to just remove the rel attribute.

-1, since sometimes download links are in fact *download links*.  So
this design choice would unncessarily limit the number of projects for
whom the option could be applied automatically and immediately.

That is, a project with a download link of foobar.com/foobar-1.2.tgz
would no longer be usable if you removed the download link from the
/simple index, but would remain usable if the rel attribute were
removed.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] A 90% Solution

2013-03-11 Thread PJ Eby
On Mon, Mar 11, 2013 at 8:28 PM, M.-A. Lemburg m...@egenix.com wrote:
 On 12.03.2013 00:39, Donald Stufft wrote:

 On Mar 11, 2013, at 7:04 PM, PJ Eby p...@telecommunity.com wrote:

 Just a thought, but...

 If 90% of PyPI projects do not have any external files to download,
 then, wouldn't it make sense to:

 To be accurate it's 90% don't have any files/release available *only* 
 externally. Most have external  files to download because it's very rare 
 that a project doesn't include an home_page or a download_url, especially 
 since distutils complains if you don't.

 How are you going to verify that disabling the links
 on those projects won't make certain release versions of
 those packages unavailable for pip/easy_install ?

I'm not sure if you're asking Donald or me here.  My proposal was to
only automatically disable the rel attributes for links to pages that
do *not* contain any easy_install or pip-able download links.  So, by
definition, this would not make any releases unavailable.

As for what Donald is proposing, I honestly have no idea what he's
talking about, or whether the 90% statistic actually applies for what
I'm proposing.

So it's possible that it might be a lot less than 90% that my proposal
would be able to affect *instantly*, without contacting any authors.


 How are you planing to inform the package authors of that
 change, so that they can take corrective action ?

 Which options would be available for authors ?

Do see my proposal again, which was simply that there be a switch to
enable or disable the rel attributes, that it default off for new
packages, and be switched to off for exactly that set of packages
which would not result in the loss of access to any download files.

There is, at this point, the question of how to handle projects that
have some of their releases hosted externally, or with some of the
files external and some not.  I would prefer that any automated
changeover apply only to packages where the set of discoverable links
is exactly equal to the links found on the project's /simple page.


 Regarding the links, it's probably better to not
 remove the rel= attributes but instead change them
 from rel=download to e.g. rel=external-download;
 or to keep the old index semantics around as /simple-v1/.
 This keeps the valuable semantic relation available for
 tools that want to use it.

For what?  If you must keep them, rel=disabled-homepage etc. would
get the message across.  But I really don't see the point, and I
*invented* the bloody things.

Frankly, I'm more than prepared to toss the rel attributes altogether,
after adequate notice is given for people to move their files or links
to the files.  I just don't want any changes in the *rest* of the
/simple generation algorithm.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-10 Thread PJ Eby
On Sun, Mar 10, 2013 at 11:07 AM, holger krekel hol...@merlinux.eu wrote:
 Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig:
 scrutiny and feedback welcome.

Hi Holger.  I'm having some difficulty interpreting your proposal
because it is leaving out some things, and in other places
contradicting what I know of how the tools work.  It is also a bit at
odds with itself in some places.

For instance, at the beginning, the PEP states its proposed solution
is to host all release files on PyPI, but then the problem section
describes the problems that arise from crawling external pages:
problems that can be solved without actually hosting the files on
PyPI.

To me, it needs a clearer explanation of why the actual hosting part
also needs to be on PyPI, not just the links.  In the threads to date,
people have argued about uptime, security, etc., and these points are
not covered by the PEP or even really touched on for the most part.

(Actually, thinking about that makes me wonder  Donald: did your
analysis collect any stats on *where* those externally hosted files
were hosted?  My intuition says that the bulk of the files (by *file
count*) will come from a handful of highly-available domains, i.e.
sourceforge, github, that sort of thing, with actual self-hosting
being relatively rare *for the files themselves*, vs. a much wider
range of domains for the homepage/download URLs (especially because
those change from one release to the next.)  If that's true, then most
complaints about availability are being caused by crawling multiple
not-highly-available HTML pages, *not* by the downloading of the
actual files.  If my intuition about the distribution is wrong, OTOH,
it would provide a stronger argument for moving the files themselves
to PyPI as well.)

Digression aside, this is one of things that needs to be clearer so
that there's a better explanation for package authors as to why
they're being asked to change.  And although the base argument is good
(specifying the homepage will slow down the installation process),
it could be amplified further with an example of some project that has
had multiple homepages over its lifetime, listing all the URLs that
currently must be crawled before an installer can be sure it has found
all available versions, platforms, and formats of the that project.

Okay, on to the Solution section.  Again, your stated problem is to
fix crawling, but the solution is all about file hosting.  Regardless
of which of these three hosting modes is selected, it remains an
option for the developer to host files elsewhere, and provide the
links in their description...  unless of course you intended to rule
that out and forgot to mention it.  (Or, I suppose, if you did *not*
intend to rule it out and intentionally omitted mention of that so the
rabid anti-externalists would think you were on their side and not
create further controversy...  in which case I've now spoiled things.
Darn.  ;-) )

Some technical details are also either incorrect or confusing.  For
example, you state that The original homepage/download links are
added as links without a ``rel`` attribute if they have the ``#egg``
format.  But if they are added without a rel attribute, it doesn't
*matter* whether they have an #egg marker or not.  It is quite
possible for a PyPI package to have a download_url of say,
http://sourceforge.net/download/someproject-1.2.tgz;.

Thus, I would suggest simply stating that changing hosting mode does
not actually remove any links from the /simple index, it just removes
the rel= attributes from the Home page and Download links, thus
preventing them from being crawled in search of additional file links.

With that out of the way, that brings me to the larger scope issue
with the modes as presented.  Notice now that with this clarification,
there is no real difference in *state* between the pypi-cache and
pypi-only modes.  There is only a *functional* difference...  and
that function is underspecified in the PEP.

What I mean is, in both pypi-cache and pypi-only, the *state* of
things is that rel= attributes are gone, and there are links to
files on PyPI.  The only difference is in *how* the files get there.

And for the pypi-cache mode, this function is *really*
under-specified.  Arguably, this is the meat of the proposal, but it
is entirely missing.  There is nothing here about the frequency of
crawling, the methods used to select or validate files, whether there
is any expiration...  it is all just magically assumed to happen
somehow.

My suggestion would be to do two things:

First, make the state a boolean: crawl external links, with the
current state yes and the future state no, with no simply meaning
that the rel= attribute is removed from the links that currently
have it.

Second, propose to offer tools in the PyPI interface (and command
line) to assist authors in making the transition, rather than
proposing a completely unspecified caching mechanism.  Better to have
some 

Re: [Catalog-sig] Search engine relevance

2013-03-10 Thread PJ Eby
On Sun, Mar 10, 2013 at 4:23 AM, Richard Jones r1chardj0...@gmail.com wrote:
 This might solve the AGI problem and could probably produce good results
 using the current ranking algorithm. Not sure. Google's search
 algorithms are far advanced ;-)

Heh.  This just gave me a bit of a chuckle, taken out of context.

AGI, you see, is also an acronym for artificial general
intelligence, so for a moment there I thought you were suggesting
that using Postgres rankings properly could bring about the
Singularity.  ;-)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-10 Thread PJ Eby
On Sun, Mar 10, 2013 at 5:16 PM, Donald Stufft don...@stufft.io wrote:
 If someones release process forces PyPI to have security, uptime, and privacy 
 issues then I'm very sorry but their release process is going to need to 
 change. It's not fun, it's a shitty situation, but trying to bend over 
 backwards to enable their current release processes is like trying to bend 
 over backwards to enable people to still walk into their banks vault and grab 
 a stack of currency.

When people in group 1 express disapproval of people in group 2, this
creates a rallying effect among members of group 1, and a *negative*
counter-reaction in members of group 2.

This is effective if, and *only* if, the people in group 2 have less
power in the situation than the people in group 1.   For example, if
co-operation from the people in group 2 are not needed in order to
carry out the wishes of group 1.

However, in the situation under discussion, such co-operation is
required, which means an alternative motivational strategy is
indicated.

That strategy involves giving persons in group 2 a better reason to
care than because we in group 1 think you group 2 people are
thieves.

And by better, I mean, a reason that *benefits group 2*, and more
specifically, each individual in group 2 who chooses to co-operate.

And ideally, you work also to lower the cost of that co-operation.

That's what *this* thread was originally about (lowering the cost of
co-operation), before these burn the witch sentiments started up
again.  So, why not just step aside and let the adults go back to
working on the actual problem?

Just kidding, of course.   ;-)  That's an example of me using the same
type of communication style, in the opposite direction: spewing
disapproval at something I don't like, instead of giving you a reason
that benefits *you*, to do what I want.  See how it feels, going the
other direction?  Did it motivate you to be helpful?  I'm guessing
not.  ;-)

Anyway, my point is this: people don't like it one bit when you tell
them what to do.

If you tell them, you must do X, you get resistance.

But if you offer them a choice, Are you going to do X or Y?, there's
much less resistance.

And if one choice is less convenient than the other, most will pick
the easier choice.

So, would you rather fight with developers to make them do it your
way, or have most of them do exactly what you want and most of the
rest get pretty close, but not have to fight with them about it?

Right now, the impression you and certain other people are giving me
is that it is more important that whatever action we take be seen as
censuring the practice of off-PyPI hosting, than that we actually fix
the problems!

And it's difficult to take such a position seriously, because the
post-hoc rationalization of harms is, well, unconvincing at best to a
neutral party.  When PyPI was first built, it didn't *have* hosting,
so there was nothing morally wrong about off-site hosting then.

And when hosting was first added, automated downloading didn't exist
yet, either.  So it still wasn't wrong.

And when I added automated downloading, I made the choice to encourage
people to collaborate by making it as easy as possible.  So offsite
hosting still wasn't wrong, in fact it was a documented alternative.

And that's been the case for, oh, 8 years now?

So what you're actually doing isn't crusading against evil-doers, it's
more like saying that every restaurant that isn't McDonalds should be
immediately remodeled, because you have just noticed the shocking
trend that hardly any of those restaurants will serve you food as
quickly!

And that of course, the restaurant owners should undertake the
remodeling and procedure changes, retraining, retooling, etc. at
*their* expense, on *your* timeline.

Just so that *you*, who *chose to visit those restaurants in the first
place*, can get your food a bit more quickly.

Sure, I know that's not how *you* see it.

But surely you can see that's how the *restaurant owners* are going to see it.

And if you want them to co-operate, it's probably going to be in your
interest to focus your attention on their side of the equation, rather
than on yours.  You already agree with your point of view.  They
don't.

I realize that can be difficult to do when you have strong feelings
about a subject.  For example, as I write this I keep backing up and
deleting all sorts of unhelpful things I find myself wanting to say.
;-)

And I'm doing that because I'm consciously reminding myself that
*getting to a solution* is more important to me than *making you feel
bad* for being wrong on the internet.

What's more important to you?  The *actual* state of PyPI, or the
state of who is to be considered right or wrong?

If it's the former, you would probably find it useful to your goals,
to please refrain from calling me and that other 10% of PyPI thieves.
Or really any other names whatsoever, explicitly OR implicitly.

Thanks.

Re: [Catalog-sig] hash tags

2013-03-08 Thread PJ Eby
On Fri, Mar 8, 2013 at 7:50 AM, M.-A. Lemburg m...@egenix.com wrote:
 After the feedback I got from Holger and Phillip, I'm currently
 writing a new version, which drops some of the unneeded
 requirements and spells out a few more things.

 Here's a very short version...

 Installers are modified:

 * to only follow rel=download links from the /simple/ index page,
   which have a hash tag (e.g. #md5=...)
 * will only use the fetched download page if its contents match
   the hash tag
 * scan that page for rel=download links, which again have to
   have a hash tag to be taken into account
 * only install files for which the hash tag matches the
   downloaded content

 This should provide a good way to make sure that the downloaded
 files are indeed under control of the package maintainer.

There is, as I said before, a MUCH simpler way to do this, that works
right now: put direct #md5 download links in your description, and
phase out the rel= attributes altogether.

The key to making this transition isn't creating elaborate new
standards for the tools, it's *creating new tools for the standards*.

Specifically, *migration tools*.  A migration tool could be made that
scans existing external links and converts found links to #md5 links
or alternately uploads the files themselves to PyPI.  You can do that
without changing pip or distribute or anything else but PyPI, so
there's no need to wait out update cycles to take advantage.

Once a project/version has switched to either #md5 links or PyPI
copies, you can just drop the rel= attributes and you're done.

Alternately, if using the description for download links is considered
a bad idea, add a new field to PyPI for them.

Point is, this entire thing can be done correctly at the PyPI end and
work with the existing API of the download tools.


 So far the only practical problem I've found with the approach
 is that the download page may not contain dynamic data, e.g.
 a date or timestamp, since that causes the hash tag not to
 verify.

Which is completely unnecessary if one simply exposes the *actual*
download links directly on PyPI.  The download page is redundant, in a
couple different ways.  First, since it can't change, there's no point
in re-fetching it all the time.  Second, since it's only going to be
read by tools anyway, there's no point to it containing anything
besides the link.

So, since the page only contains links, might as well put the links
straight on PyPI, or at most have an option/tool to load the links
from an external source.

Again, the key to making this work is going to be somebody putting
buttons in the PyPI interface (and making setuptools/distutils
commands or similar CLI tools) to migrate their files (or links to the
files) to PyPI hosting.  A new API for such tools is entirely
unnecessary -- at most there might need to be a new field made
available/accessible.  (Personally I don't care if your download links
have to be in the description field if you're hosting off-site, but
that's just me.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecation of External Urls, Statistics

2013-03-08 Thread PJ Eby
On Fri, Mar 8, 2013 at 8:13 AM, Donald Stufft don...@stufft.io wrote:
 It does solve the backwards compatibility issue of killing external urls 
 immediately so I'm not flat out against it, but there may be legal issues 
 involved too?

I've mentioned this in the other thread as well, but the best way to
actually ensure this stuff gets moved over to PyPI is to make it
*easy*.  Give developers a button to click on PyPI that fetches all
their external links (requiring first that you give matching MD5 or
other checksums) and uploads them to PyPI, and a whole bunch of those
projects are likely to be okay with clicking it a few times.  A
command-line tool to do it (especially as a distutils/setuptools
command) would be a good idea, too.

Of the tiny minority of remaining people who object to PyPI hosting
for some reason other than convenience/familiarity (e.g. MAL's
licensing objection), it will likely be sufficient to provide an
option to add #md5 links to their description, in lieu of actual
rehosting.

FWIW, it's hard to get people to change behavior when one condemns
that behavior as unlikeable or socially undesirable, because it means
one is less likely to consider the other person's motivations, needs,
etc., and on top of that, the other person's resistance and rebellion
are stirred up by being the subject of one's disapproval.

So please, let's all stop talking about ways to work around the
package authors and project maintainers, or how to force them into
doing our bidding, and start talking instead about how to make it
*easy* and *obvious* for them to do what we want.

(And people who think it's already easy and obvious enough, so those
10% of projects must be stupid, will obviously not have anything
positive to contribute.)

So let me kick off that discussion with a list of known-so-far use
cases for external hosting, in descending order of my extremely rough
guesstimate of frequency:

* Always did it that way, never saw a reason to change, or didn't know
you could upload to PyPI
* Lots of files that are currently generated on the system where
they're hosted, or in an automated system that would need significant
rework to support PyPI
* Development snapshots (which may in fact be depended upon by other
in-development projects, so manual URL specification doesn't help
here)
* Had an issue w/PyPI availability in the past
* Objectors to PyPI's licensing requirements

Automation is aimed at the first two: make it easy enough, w/a carrot
and a stick (external link spidering is going away, you have to put
either the links or the files on PyPI directly if you want them
found), and a lot of people will move (assuming they're actually
still maintaining their project).

Development snapshots are an interesting case, because one of the
reasons they're valuable is that PyPI's existing multi-release
behavior is a major PITA.  You can't upload a new version of something
without PyPI creating a new release for it...  and automatically
hiding all your previous releases, including your stable release.
There's a lot that would have to be done to PyPI's release management
before it would actually be sane to track such releases there.  So the
obvious fix is to do nothing; such links being external doesn't hurt
availability for people that don't depend on them (unlike
rel=homepage/download links).

The last two issues are education/persuasion problems that won't be
affected by technology changes.

Does anybody know of any other use cases for the thousands of projects
and releases relying on external link discovery spidering?

(Disparaging remarks about why a particular use case is bad, no good,
makes you go blind, etc. need not apply: they serve only to show that
the person providing the opinion lacks sufficient empathy with the
target audience to be *useful* in a discussion of how to persuade that
target audience to behave differently.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] hash tags

2013-03-08 Thread PJ Eby
On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz n...@coderanger.net wrote:
 MD5 is _not_ acceptable for anything security related and we shouldn't be 
 adding anything that increases our dependence on it. MD5's only use in the 
 packaging world is to make people who forget that TCP has its own checksums 
 feel all warm and fuzzy that there hasn't been _accidental_ download 
 corruption.

So, you're saying that someone has found a second-preimage attack
against MD5 that's more efficient than the current 2**127 threshold
established in 2009?

Anything security related is pretty broad.  Out of the many classes
of attacks on hashes, AFAIK the only class that's relevant to PyPI is
second preimage attacks,  i.e. one where the attacker has the original
file and the hash, and must construct a new file that produces the
same hash value.

Did you have some other type of hash attack in mind?  And in either
case, do you have a referent for the attack complexity?
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] hash tags

2013-03-08 Thread PJ Eby
On Fri, Mar 8, 2013 at 4:17 PM, M.-A. Lemburg m...@egenix.com wrote:
 On 08.03.2013 20:16, PJ Eby wrote:
 There is, as I said before, a MUCH simpler way to do this, that works
 right now: put direct #md5 download links in your description, and
 phase out the rel= attributes altogether.

 No, that would be a pretty poor design :-)

 The rel= attributes are good design, since they were meant for
 exactly this purpose (machine reading and understanding relations
 between origin and target).

That depends on the goal of your design.  If the goal is to phase out
offsite spidering by downloader tools in a reasonably easy and
low-cost way, introducing new API is not a good way to do it.

The simple way to do it is to replace download-time end-user
unsupervised spidering with upload-time or registration-time
author-supervised spidering, which requires only that the tools exist
and people be informed of them (and encouraged to use them).
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] hash tags

2013-03-08 Thread PJ Eby
On Fri, Mar 8, 2013 at 4:26 PM, Donald Stufft don...@stufft.io wrote:
 On Mar 8, 2013, at 4:12 PM, PJ Eby p...@telecommunity.com wrote:

 On Fri, Mar 8, 2013 at 2:52 PM, Noah Kantrowitz n...@coderanger.net wrote:
 MD5 is _not_ acceptable for anything security related and we shouldn't be 
 adding anything that increases our dependence on it. MD5's only use in the 
 packaging world is to make people who forget that TCP has its own checksums 
 feel all warm and fuzzy that there hasn't been _accidental_ download 
 corruption.

 So, you're saying that someone has found a second-preimage attack
 against MD5 that's more efficient than the current 2**127 threshold
 established in 2009?

 Anything security related is pretty broad.  Out of the many classes
 of attacks on hashes, AFAIK the only class that's relevant to PyPI is
 second preimage attacks,  i.e. one where the attacker has the original
 file and the hash, and must construct a new file that produces the
 same hash value.

 Relevant to PyPI is pretty broad, and when you're developing a secure system 
 you need to look past what is ok *today* and design for the next 5, 10, or 20 
 years. So even if there's no attack that can directly allow replacing the 
 target file with a new one, continuing to utilize it is bad. It has a number 
 of weaknesses which do not install confidence in its future security 
 meanwhile there are a number of other hashes which _do_.

 Unless you'd rather be trying to replace hashes everywhere once it's already 
 completely broken.

We can replace it completely in a lot less than that many years, if
the new PEP-based tools can be brought to pass.  Using new protocols
(e.g. the embedded signatures in wheel files) will make most of this
moot.

What I'm against is trying to patch over the existing protocol when
what we really want is to replace it altogether.  Adding hashes and
filesizes and whatnot is just gilding the existing lily, or more like
gilding the pond scum, actually.  ;-)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] hash tags

2013-03-08 Thread PJ Eby
On Fri, Mar 8, 2013 at 4:28 PM, M.-A. Lemburg m...@egenix.com wrote:
 On 08.03.2013 20:16, PJ Eby wrote:
 So, since the page only contains links, might as well put the links
 straight on PyPI, or at most have an option/tool to load the links
 from an external source.

 I don't follow you. We only have a single download_url field
 available to store a download link.

 We'd need to modify the meta data format to allow for more than
 one such field, which doesn't work if you want to stay backwards
 compatible.

Links included in the long description field are placed on the /simple
index of links.  So you can just edit your standard metadata right
this minute if you want to offer more download links.  And you can put
#md5 tags on them if you want the tools to check that.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] hash tags

2013-03-08 Thread PJ Eby
On Fri, Mar 8, 2013 at 4:32 PM, Donald Stufft don...@stufft.io wrote:
 Here's some more information pulled straight from Wikiepdia:

Trust me, I've read a LOT of Wikipedia (and even more from other
sites, including at least the conclusions of a number of cryptography
papers) about hashing attacks recently, because I was seeing
inconsistencies in what people are saying about hashes and their
weaknesses and so forth.  99.9% of the discussion about attacks on
hashes have to do with collision attacks, prefix attacks, and length
extension attacks, all of which are extremely relevant for
*cryptographic* purposes.  Specifically, the use of hashes to verify
identity, authority, repudiability, etc...  which emphatically do
*not* apply to the use of an MD5 as a checksum to verify a correct
download.

All of these attacks depend on *something else* being at stake besides
the integrity of the original message.  For example length-extension
attacks bypass the need to know a secret used in a naive hash-based
signature scheme (which is why you're supposed to use HMAC for such
things), while collision attacks let you trick a signer into signing
something that you can later replace with something altered.

The current use of #md5 tags isn't subject to either of these kinds of
attack, because:

1. There is no secret to be revealed, and
2. The author and signer are the same person

So the only type of attack I've found out about thus far, in my
(admittedly few) hours of study on the subject, that is relevant to
the way we use MD5 on PyPI at present is the so-called second
pre-image attack, which is when you're given an existing message and
a hash, and have to create a new message with the same hash...  while
also incorporating something useful in the new message.

The most recent report I saw on second pre-image attacks against full
MD5 estimated a 2**127 strength, meaning that even if you could
process a great many billion tries per second, it would take you
thousands of years to come up with a file that could masquerade as an
existing download.  (And most people's computers and/or internet
connections would choke on the massive file sizes needed for the
still-theoretical Kelsey-Schneier generalized preimage attack, which
in any case would apply equally to just about any other hash we could
currently put out in the field. i.e., it's not specific to a
particular hash algorithm, it just relies on certain properties of the
algorithm.)

So, yeah, MD5 is *cryptographically* broken, sure.  But it's not
broken for *data integrity*.  And in the PyPI use case, the
cryptographic part is all in the SSL being used to fetch the MD5
link in the first place.


 Here's the important highlights:

 - specifically, a group of researchers described how to create a pair of 
 files that share the same MD5 checksum

Right, that's what's called a collision attack.  It means that you
can go out *ahead of time*, and make two files with the same checksum,
one good, one evil.  It does *not* mean you get to take an existing
file, and then make a second file with the same checksum.  (The latter
is a second preimage attack, which is *not* broken

Hash collision attacks in PyPI would basically require an author to
upload a special version of their package that looked innocent, and
then they could later switch that version out with one that's harmful.
 And the *way* that this works is that you specially generate *both*
files, in advance.  Which means that the author themselves is
compromised, so the threat is moot.  The author can already upload
compromised code (either through being evil or having their PC
hijacked), and what #md5 it has is 100% irrelevant.

That is, there's nothing stopping an evil author or an author with a
compromised PC from simply uploading a new file with a new MD5,
because PyPI will pass it along in exactly the same way.  Changing
hash algorithms will not affect this threat vector in the slightest.

Given these facts, it makes no sense to fuss over the hash algorithm
in current use, since a concurrent goal here is to switch to file
formats that can be directly signed using, you know, *actual*
cryptography.  ;-)

The new .wheel format makes provisions for modern signature
techniques.  It'd be good if sdists also did.  Then the #md5 tag can
die a natural death, hopefully within the year replaced by a hashtag
that say, fingerprints the author's public key as registered with
PyPI, or something of that sort.  In the meantime, there's no actual
threat here, so bikeshedding what to replace it with *while keeping
the current system* is like rearranging office furniture in a building
that's about to have demolition charges set underneath it.  ;-)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] homepage/download metadata cleaning

2013-03-01 Thread PJ Eby
On Fri, Mar 1, 2013 at 6:17 AM, holger krekel hol...@merlinux.eu wrote:
 On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote:
 On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote:
  On 01.03.2013 11:19, holger krekel wrote:
   Hi Richard, all,
  
   somewhere deep in the threads i mentioned i wrote a little cleanpypi.py
   script which takes a project name as an argument and then goes to
   pypi.python.org (http://pypi.python.org) and removes all 
   homepage/download metadata entries for
   this project. This sanitizes/speeds up installation because
   pip/easy_install don't need to crawl them anymore. I just did this for
   three of my projects, (pytest, tox and py) and it seems to work fine.
  
 
 
  Does it also cleanup the links that PyPI adds to the /simple/ by
  parsing the project description for links ?
 
  I think those are far nastier than the homepage and download links,
  which can be put to some good use to limit the external lookups
  (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
 
  See e.g. https://pypi.python.org/simple/zc.buildout/
  for a good example of the mess this generates... even mailto links
  get listed and file:/// links open up the installers for all
  kinds of nasty things (unless they explicitly protect against
  following these).
 
 

 pip at least, and I assume the other tools don't spider those links, but
 they do consider them for download (e.g. if the link looks installable
 it will be a candidate for installing, but  it won't fetch it, and look for
 more links like it will donwnload_url/home_page).

 I believe that's the way it's structured atm.

 That's right. Even though the long-description extracted links
 look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything
 with them except if the href ends in #egg=PKGNAME- in which case they are
 taken as pointing to a development tarball (e.g. at github or bitbucket).
 ASFAIK a link like PKGNAME-VER.tar.gz will not be treated as
 an installation candidate, just the #egg=PKGNAME one.

Both are considered primary links.  A primary link is a link whose
filename portion matches one of the supported distutils or setuptools
file formats, or is marked with an #egg tag.  Primary links are
indexed as to project name and version, so that if that version/format
is chosen as the best candidate, it will be downloaded and installed.

Links marked with rel=homepage or rel=download are secondary
links.  Secondary links are actively retrieved and scanned to look
for more primary links.  No further secondary links are scanned or
followed.  (Details of all of this can be found at:
http://peak.telecommunity.com/DevCenter/setuptools#making-your-package-available-for-easyinstall
)

This basically means that MAL's proposal for a download.html file is
actually a bit moot: you can just stick direct primary download URLs
in your PyPI description field, and the tools will pick them up.  They
can even include #md5 info.  (See
http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api
- item 4 mentions the description part.)

This means, by the way, that you could make an external link cleaner
which spiders the external pages and pulls the candidates onto the
description for that release, thereby keeping useful primary links and
getting rid of the secondary links used to fetch them.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecate External Links

2013-03-01 Thread PJ Eby
On Fri, Mar 1, 2013 at 4:24 AM, M.-A. Lemburg m...@egenix.com wrote:
 On 01.03.2013 10:02, Reinout van Rees wrote:
 On 28-02-13 21:08, holger krekel wrote:
 I have seen that position in this discussion (I have to upload 120
 files per release, so I won't do that, for instance).

 haven't seen that.

 Marc-Andre Lemburg said this, which I took to mean 120 uploads per release:

 
 However, taking our egenix-mx-base package as example, we have
 120 distribution files for every single release. Uploading those
 to PyPI would not only take long, but also ...
 

 Correct, with a total of over 100MB per release. However, the above
 quote is slightly incorrect: I did not say I won't do that, just
 that there are issues with doing this:

 * It currently takes too long uploading that many files to
   PyPI. This causes a problem, since in order to start the upload,
   we have to register the release on PyPI, which tools will then
   immediately find. However, during the upload time, they won't
   necessarily find the right files to download and then fail.

Actually, easy_install doesn't pay any attention to what releases are
registered.  It just looks for primary and secondary links.  If there
are links for a version that it can use, it uses it.  If it does not
find links for a version, then that version does not exist, as far as
it is concerned.  So registering without files is not a problem.


   The proposed pull mechanism (see
   http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
   would work around this problem: tools would simply go to
   our servers in case they can't find the files on PyPI.

That proposal is unnecessary, actually.  You could *right now* simply
place binary download links (with optional #md5= verification)
in your package's description field, and the moment you registered the
package, existing tools would find those links and download them from
your site.  You could then remove your home page and download URLs
from the relevant fields, and place them also in the description.
(easy_install does not follow non-download links within the
description -- i.e., links that don't end in .egg, .tgz, etc. and
don't have an #egg tag.)


 * PyPI doesn't allow us to upload two egg files with the same
   name: we have to provide egg files for UCS2 Python builds and
   UCS4 Python builds, since easy_install/setuptools/pip don't
   differentiate between the two variants.

They can if it's part of the platform string; the catch is that right
now it's not.  We'd have to go through an upgrade cycle of the tools
to support that.  I need to take a look at what PEP 427 is doing (and
you should too, if you haven't already) to get this part sorted out.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] homepage/download metadata cleaning

2013-03-01 Thread PJ Eby
On Fri, Mar 1, 2013 at 2:31 PM, M.-A. Lemburg m...@egenix.com wrote:
 Hmm, then why not remove links that don't match the above from
 the /simple/ index pages ?

PyPI provides the links uninterpreted since the tools' interpretations
have evolved over time.


 Note that it's easily possible to make e.g. file:/// links
 have a fragment that matches what you described, so I guess the
 filters would have to be more careful about what to allow
 (e.g. only http/ftp schemes, perhaps even only https schemes)
 and what not.

file:// URLs are an intentionally supported feature of easy_install;
many users have local NFS-based or other shared repositories.  But
yes, it certainly would be reasonable to not include links to them on
PyPI.  ;-)


 BTW: Are those links also shown as-is on the description page ?
 People could do nasty stuff by adding javascript: links which look
 like normal links to the descriptions.

That's true, but is unrelated to the tools, since the tools can't
process javascript links.

It would probably be best, though, if PyPI filtered such URLs to
prevent script injection/CSRF attacks on logged-in PyPI users browsing
project descriptions.  I don't know if it already does this or not,
since I've never tried to inject a CSRF attack on PyPI.  ;-)

(I guess technically it would be a same-site request forgery rather
than a cross-site one, but you know what I mean.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Next generation package infrastructure (was: Deprecate External Links)

2013-02-28 Thread PJ Eby
On Thu, Feb 28, 2013 at 4:31 AM, M.-A. Lemburg m...@egenix.com wrote:
 In order for this to work out, you will need to get the
 support of people hosting packages externally and address
 their concerns.

 The current discussion has been too dogmatic for my taste.
 A more pragmatic approach would likely be a more reasonable
 and successful way to achieve a transition.

I think maybe if we have an uploader tool like the one I mentioned in
one of the other spinoff threads, we could address at least the
current upload situation by making it super easy to upload your
external files.  Better still, have a button you can press in the PyPI
UI that says, fetch all my external distributions, and it gives you
a preview of the download files it's going to fetch (so you can filter
out mis-detected ones), and then it does the pulling.  Such a tool
could survive migration to the new infrastructure as well.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Migrating away from scanning home pages (was: Deprecate External Links)

2013-02-28 Thread PJ Eby
On Thu, Feb 28, 2013 at 5:55 AM, M.-A. Lemburg m...@egenix.com wrote:
 I think we all agree that scanning arbitrary HTML pages
 for download links is not a good idea and we need to
 transition away from this towards a more reliable system.

 Here's an approach that would work to start the transition
 while not breaking old tools (sketching here to describe the
 basic idea):

 Limiting scans to download_url
 --

 Installers and similar tools preferably no longer scan the all
 links on the /simple/ index, but instead only look at
 the download links (which can be defined in the package
 meta data) for packages that don't host files on PyPI.

 Going only one level deep
 -

 If the download links point to a meta-file named
 packagename-version-downloads.html#sha256-hashvalue,
 the installers download that file, check whether the
 hash value matches and if it does, scan the file in
 the same way they would parse the /simple/ index page of
 the package - think of the downloads.html file as a symlink
 to extend the search to an external location, but in a
 predefined and safe way.

Clever.  This is actually backward compatible with existing tools, in
that they will read this file right now.  The hashing and verification
isn't supported, but we could add warnings to do it.

Actually, the essence of your idea can be done even more simply: just
require that the link include a hash that the fetched page will be
verified against.  It essentially ensures that stale external links
can't break anything.

Further, since the existence of the hash means that the page can't be
changed without changing the URL, it means that PyPI *itself* can
simply fetch it once, parse the links from it, and serve them directly
on the /simple index page.  If you change the download URL, PyPI
discards the previous links and redoes the scan.

All in all, though, I'm not sure it's as viable as a simple upload my
external release button (in the UI) and matching setup.py command
(for automation) as a way of getting people's releases done.  It seems
like builidng a downloads.html for your files from SourceForge, say,
would be just an annoying intermediate step.

(This is assuming, of course, that the licensing issues can be worked out.)


 * In a later phase of the transition we could have
   PyPI cache the referenced distribution files locally
   to improve reliability. This would turn the push
   strategy for uploading files to PyPI into a pull
   strategy for those packages and make things a lot
   easier to handle for package maintainers.

I like this part.  I think we should just go straight there, and skip
the intermediate link formatting stuff.  ;-)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecate External Links

2013-02-28 Thread PJ Eby
On Thu, Feb 28, 2013 at 5:00 PM, Donald Stufft donald.stu...@gmail.com wrote:
 SSL checking on upload should be possible, do you want
 a patch?

If it uses the 'requests' library, yes, I'll accept one.  But I don't
want to do any direct implementation of SSL cert checking in
setuptools, at least in the short run (next few weeks), because:

1. I don't consider myself qualified as yet to write a correct patch
or even verify that a contributed patch is correct/safe, and

2. There is a licensing issue with including the Mozilla root
certificate set in setuptools under its current license, and I'm not
100% certain I can *change* the license.  (I *could* potentially use a
platform-provided cert set, but that's not really an option on Windows
unless you have Windows expertise above my paygrade for pulling that
stuff out of the registry.)

So, by delegating to the requests library, I can bypass both of those
issues in the short term.  In the longer term (1 month from now),
more integrated solutions may be more feasible.  Using requests is
the best I think I can reasonably achieve by PyCon, but I *will* be
publicizing a set of instructions for how to safely download
setuptools and requests (via https in a browser to prevent MITM
attacks), as well as how to configure easy_install for more secure
default settings.  (And easy_install will always use requests if
present, unless specifically asked not to with a --no-ssl-verify
option.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecate External Links

2013-02-27 Thread PJ Eby
On Wed, Feb 27, 2013 at 1:34 PM, Lennart Regebro rege...@gmail.com wrote:
 On Wed, Feb 27, 2013 at 5:34 PM, M.-A. Lemburg m...@egenix.com wrote:
 I'm not saying that it's not a good idea to host packages on PyPI,
 but forcing the community into doing this is not a good idea.

 I still don't understand why not. The only reasons I've seen are
 Because they don't want to or because they don't trust PyPI. And
 in the latter case I'm assuming they wouldn't use PyPI at all.

I haven't seen anybody mention it yet, but checkouts of development
versions are a use case that can't currently be addressed without
support for multiple external links.  For example, setuptools itself
offers SVN checkout URLs for two different branches.  I've also seen
in-development packages offered via github or bitbucket checkouts as
well.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecate External Links

2013-02-27 Thread PJ Eby
On Wed, Feb 27, 2013 at 4:04 PM, Lennart Regebro rege...@gmail.com wrote:
 On Wed, Feb 27, 2013 at 8:49 PM, Monty Taylor mord...@inaugust.com wrote:
 But wouldn't this only be a change in pip/easy_install, not PyPI
 itself? I suppose you could explicitly break the external links by
 having them point to nothing if you are worried about the security or
 if it's some performance issue (that would indeed be a bad
 compatibility break, in case people are using those for other
 purposes).  Otherwise, if it's a problem, then just use the old
 version of pip.

 If we don't remove the feature from pypi itself

 It isn't a feature of PyPI. PyPI doesn't require you to upload the
 files to PyPI. For that reason, easy_install and PIP will scrape
 external sites to be able to download the files.

 What we should do is agree that this should stop,

So far, I don't think anybody's talking to the right we for stopping
it.  It's the tools that control this, not PyPI.  (PyPI can't actually
stop the tools from using this information without also making itself
a lot less useful to *humans* at the same time.)

As far as my personal position on the matter, I think that it's
reasonable to deprecate the scraping of home page and download links.
As somebody pointed out, expired domains are a potentially nasty
problem there.

OTOH, I currently make development snapshots of setuptools and other
projects available by dumping them in a directory that's used as an
external download URL.  Replacing that would be a PITA because PyPI
only lets you upload and register new releases from distutils' command
line.  Basically, I'd need to use a download link that pointed to a
latest URL that redirected to the final download.

Anyway, I'm not seeing much discussion here about how to help authors
make changes to their release processes.  Note that many popular and
long-lived projects (pywin32, PIL, etc.) have similar issues.  (Not to
mention the newer projects that host directly from revision control.)

Given that easy_install was deliberately designed so that those guys
would *not* need to change their hosting strategies to get automated
downloads, I'd like to see more talk about how we're going to help
people change their releasing and hosting strategies.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecate External Links

2013-02-27 Thread PJ Eby
On Wed, Feb 27, 2013 at 4:50 PM, Donald Stufft donald.stu...@gmail.com wrote:
 Development snapshots are a use case that i'm not sure makes sense
 for PyPI, but if they do should require specific opt-in to install them.
 Does easy_install have a command line flag that adds extra links?

*chuckle*.  Yes, it's the original source of the --find-links option,
emulated in pip to ease migration.

 can your instructions simply state to do the equivalent of
 `pip install --find-links=http://setuptools.com/dev-snapshopts/`?

The problem with find-links is that if you push them off of PyPI, they
have to go somewhere else, which is setuptools' dependency-links
feature.  Now you have an even *harder* problem to update or remove
those links, because they're not under the control of the author nor
visible on PyPI.


 Alternatively I would like to get the tooling smarter about not installing
 pre-release versions unless asked as well.

Yes, and that discussion doesn't have much to do with PyPI per se,
because again, it's up to the tools.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] HTTPS now promoted on PyPI

2013-02-19 Thread PJ Eby
On Tue, Feb 19, 2013 at 12:13 AM, Richard Jones r1chardj0...@gmail.com wrote:
 2. incorporate some monkey-patching into distribute and setuptools and
 promote those,

This is actually on my radar to do for setuptools, as soon as the dust
has settled enough on what it is the monkey-patching needs to *do*.
;-)

So far I know I'll be changing the default URLs and adding cert
verification, but I haven't looked at the register or upload stuff
yet.  The part where people are saying https isn't working right now
is a big red flag for me, however; I don't want to push out an update
that'll just make the load situation worse.

In the meantime I'll be investigating and testing, of course.  (One
annoying issue presently under investigation: determining whether
including a cacert bundle means setuptools' license terms will have to
change.  Pip used LGPL, which appears to be compatible with the MPL.
I personally don't think certs should be copyrightable in the first
place, but some jurisdictions have compilation copyright of otherwise
non-copyrightable individual elements.  Presumably, Mozilla's not
going to be a jerk about things, but...  bleah.  Licensing issues
*suck*.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] HTTPS now promoted on PyPI

2013-02-19 Thread PJ Eby
On Tue, Feb 19, 2013 at 8:35 AM, Giovanni Bajo ra...@develer.com wrote:
 I would be OK with redirecting for browsers (matching the user agent for
 instance), but I would try to disable for tools as much as possible.

Matching paths is an option, too: the /simple index is intended for
tools, and the main /pypi index for humans.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Remove pypi redirects

2013-02-19 Thread PJ Eby
On Tue, Feb 19, 2013 at 1:31 PM, Marcus Smith qwc...@gmail.com wrote:
 looking on the bright side,  it made us aware that we had a leak to pypi
 in our build.  we were trying to be local.  so thanks.
 Had to go update our .pydistutils.cfg file
 Marcus

FYI, easy_install's --allow-hosts option can prevent such leaks.  (But
maybe that's why you're editing pydistutils.cfg ;-)  )
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] New PyPI stats available

2013-02-18 Thread PJ Eby
On Mon, Feb 18, 2013 at 9:55 AM, Alex Clark acl...@aclark.net wrote:
  aclark@Alexs-MacBook-Pro:~/Developer/aclark/resume/  vanity pydstat
  pydstat-1.0.0.tar.gz 2012-08-152,216
  pydstat-1.0.1.tar.gz 2012-08-234,367
  
  pydstat has been downloaded 6,583 times!

Nice -- any chance you could add version filtering?  vanity
setuptools reports ~8.4 million downloads for setuptools, but the
current release actually stands at only around 4.8 million.  ;-)
(Also, the formatting is off for the most popular downloads, because
the count column isn't wide enough to show 7 significant figures.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Allowing the upload of .py files at PyPI

2013-02-15 Thread PJ Eby
On Thu, Feb 14, 2013 at 6:31 PM, Richard Jones rich...@python.org wrote:
 The bootstrap.py file would most likely have to be omitted from the
 usual files listing mechanisms as they are used to determine
 installable release packages.

I would feel more comfortable with the proposed mechanism if it
allowed the .py files to retain their original names.  There is a ton
of collateral out there referring people to ez_setup.py, and while I
can (and will) redirect the original URL to wherever it ends up, it'd
be less confusing to keep the name.

Among other things, it would help prevent the sort of phishing attack
where somebody represents *their* ez_setup.py script as the real deal,
while saying that setuptools/bootstrap.py is an obvious forgery, since
it's not named ez_setup.py.  ;-)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Proposal for the bootstrap API

2013-02-15 Thread PJ Eby
On Fri, Feb 15, 2013 at 8:10 AM, Nick Coghlan ncogh...@gmail.com wrote:
 On Fri, Feb 15, 2013 at 10:25 PM, Tarek Ziadé ta...@ziade.org wrote:
 Anyways: I am withdrawing my proposal - if we're special-casing a few
 projects,  why bother creating a new API in the first place ?

 That's why I asked how frequently the bootstrap files needed updates
 earlier - if they're fairly static, then simply asking for a copy to
 be hosted on PyPI and documenting that as the canonical location is by
 far the most straightforward solution.

 The only reason for an API would be if the projects wanted to be able
 to update them directly without asking the PyPI admins to upload a new
 version (and, as you note, that could potentially be handled via
 ssh/scp config rather than via the PyPI web app).

Also, it may make sense to get rid of the bootstrap files in the long
run anyway.  ez_setup started the whole business with only one real
function: to solve the chicken-and-egg problem of allowing developers
to make use of dependencies without first needing their users to
install setuptools.  Is that a problem that actually needs solving any
more, almost a decade later?

(Apart from that use, the only thing it's good for is helping 64-bit
Windows users install the right version of setuptools in the right
place, and there will probably be a better fix for that eventually as
well.)

Buildout actually has a better reason than any of the other projects
to keep a bootstrap file around, and that's that it's targeted at a
general sysadmin audience not steeped in Python packaging lore.  So
having a bootstrap makes a lot of sense...   except that there's no
reason it needs to live on PyPI, per se.  Zope corp. undoubtedly has
secure hosting and certs of their own, and the very thing that makes
them need a bootstrap script means that the people who need it don't
really care *what* secure source they pull it from.

It's possible I'm misunderstanding some things there, and I hope Jim
will chime in with corrections if applicable.  But I'm thinking maybe
instead of working out PyPI hosting for these things, we should just
get rid of them or host them elsewhere.  (I have at least one domain
w/a trusted cert that could be used, for example.)

(One additional point, though: for ez_setup.py's main use case, it's
currently distributed by way of anonymous SVN, and zillions of source
packages already hosted on PyPI.  Most of the time, the copy somebody
uses *already* came from somewhere other than the primary source.
Factor *that* into the phishing scenarios for a bit...)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Allowing the upload of .py files at PyPI

2013-02-14 Thread PJ Eby
On Thu, Feb 14, 2013 at 5:10 PM, Nick Coghlan ncogh...@gmail.com wrote:
 I'm more concerned about phishing style attacks. I don't want the PyPI
 admins to have to start scanning for hostile names like distirbute.

I'm not sure what you mean.  These things exist only for the
corresponding package (buildout, setuptools, or distribute), and
aren't downloaded from any other project.  Generally, they are
downloaded either by 1) a human, or 2) another tool that wants to
support installation in the absence of a pre-existing setuptools or
distribute installation (mainly zc.buildout AFAIK).

(Or are you saying that somebody might upload a project called, say,
distribute_, and try to trick people into downloading it?  I'm not
sure how that's a threat that can be defended against in any event.)

 So how often do the bootstrap files change?

Setuptools releases an updated version with each new release, as it
contains an MD5 signature for downloading the new release.  I *think*
distribute does the same.  Not so sure about buildout.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] PyPI and setuptools

2013-02-12 Thread PJ Eby
On Sat, Feb 9, 2013 at 6:43 PM, M.-A. Lemburg m...@egenix.com wrote:
 * distutils config files:

 http://docs.python.org/2/install/index.html#inst-config-files

 * setuptools:

 http://peak.telecommunity.com/DevCenter/EasyInstall#configuration-files
 http://peak.telecommunity.com/DevCenter/EasyInstall#command-line-options
 (the option is called --index-url)

 * distribute:

 http://pythonhosted.org/distribute/easy_install.html#configuration-files
 http://pythonhosted.org/distribute/easy_install.html#reference-manual
 (the option is called --index-url)


Also, you can run this to easily change the setting site-wide (with
either setuptools or distribute):

   sudo python setup.py saveopts -g easy_install --index-url
https://pypi.python.org/simple

It'll give you an error message about no URLs being provided, but
first it'll update the global disutils.cfg for that version of Python
or that virtualenv, e.g.:

  $ sudo python setup.py saveopts -g easy_install --index-url
https://pypi.python.org/simple
  running saveopts
  Writing /usr/lib/python2.6/distutils/distutils.cfg
  running easy_install
  error: No urls, filenames, or requirements specified (see --help)

(If you want to restrict easy_install to only download from pypi by
default, you can also add an --allow-hosts setting to the easy_install
part of the command line.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] PyPI and setuptools

2013-02-12 Thread PJ Eby
On Mon, Feb 11, 2013 at 2:55 AM, Marcus Smith qwc...@gmail.com wrote:
 As for then making Distribute the default in virtualenv's (or the only
 option), there is a virtualenv issue for that.
 https://github.com/pypa/virtualenv/issues/217
 apparently there's an issue with UAC elevation on windows.
 that issue could use some help getting going...

There's a fix for the UAC issue in the current release of setuptools,
if that helps.

(Actually, I think it was put in a couple of releases ago.  Either
way, it should be in the setuptools commit logs from a few years ago.
There are a number of bugs like this that were fixed in setuptools
many years ago, but never merged by distribute; I don't think anybody
from distribute has been monitoring the setuptools tracker or
repository much since the original divergence.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] PyPI and setuptools

2013-02-12 Thread PJ Eby
On Tue, Feb 12, 2013 at 2:11 PM, Giovanni Bajo ra...@develer.com wrote:
 Il giorno 12/feb/2013, alle ore 19:36, PJ Eby p...@telecommunity.com ha 
 scritto:

 On Sat, Feb 9, 2013 at 7:54 PM, Giovanni Bajo ra...@develer.com wrote:
 The problem with this approach is that Python standard library does not 
 validate SSL certificates. So even if you force a urllib-based tool to 
 access PyPI through https, it doesn't help at all in case of a MITM attack.

 FWIW, if someone provides a suitable *cross-platform* urllib
 monkeypatch that does certificate validation, even if it only
 validates PyPI's certificate, I'll add it to setuptools and issue a
 patch release that uses it, and has its default index URL updated to
 the https version.


 This is an option:
 https://gist.github.com/zed/1347055

 it's not a monkeypatch, but it's a handler. You probably want to include a CA 
 bundle (eg: the Mozilla one like pip is doing), and use that by default.

Thanks!  TBH, cert stuff makes my head hurt, which is why there's not
more of it in setuptools already: I hesitate to sprinkle a dash of
stuff I don't understand on top of other things and call the problem
solved.  That seems like something of an antipattern to me.

But I suppose I'll need to learn some of it at least, in order to be
able to build a CA bundle, unless I steal whatever pip does.  I can
start on integrating this in the meantime at least, and hopefully can
get it out around the same time that PyPI's cert is updated.  I'm
nonetheless hesitant to conclude that the problem of security on *non*
PyPI sites or handling redirects or all the rest of it will all be
resolved in a single patch release, though.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] [Distutils] imp.find_modules and namespaces

2013-02-11 Thread PJ Eby
On Mon, Feb 11, 2013 at 11:40 AM, Alessandro Dentella san...@e-den.it wrote:

 I believe that this issue belongs to this list, please let me know if I'm
 wrong.

 Suppose I have 2 packages:

   jmb.foo
   jmb.bar

 distributed separately. Each has in jmb's __init__ a standard:


   __import__('pkg_resources').declare_namespace(__name__)

 or

   from pkgutil import extend_path
   __path__ = extend_path(__path__, __name__)


 I just realized that imp.find_module() will return fake values

   imp.find_module('jmb', None)

 may return (a tuple with) the path from the first package or from the
 second. Many framework will fail to discover commands in the inner module:
 one is detailed here [1] another is Django way of getting application's
 commands.

 I find it misleading to return a value that is not thorohly correct.

 Is there a workaround? Is the current behaviour considered correct for
 reasons I don't yet understand?

Since Python 2.5, the right way to do this is with
pkgutil.iter_modules() (for a flat list) or pkgutil.walk_packages()
(for a subpackage tree).

For your example, if I wanted to find just the subpackages of 'jmb', I would do:

import jmb, pkgutil
for (module_loader, name, ispkg) in
pkgutil.iter_modules(jmb.__path__, 'jmb.'):
# 'name' will be 'jmb.foo', 'jmb.bar', etc.
# 'ispkg' will be true if 'jmb.foo' is a package, false if it's a module
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] [Distutils] imp.find_modules and namespaces

2013-02-11 Thread PJ Eby
On Mon, Feb 11, 2013 at 4:56 PM, Alessandro Dentella san...@e-den.it wrote:
 thanks for the answer but this way I need to really import jmb while
 imp.find_module doesn't really import it.

If you want to know whether the module 'jmb' exists, you can certainly
do that by using pkgutil.iter_modules().  What you *can't* do -- in
*any* version of Python as far as I know -- is tell for certain
whether 'jmb.foo' exists, without first importing jmb.  (Since until
jmb is imported, there's no way to know what __path__ value it will
end up with.)

This is true for namespace packages in all versions of Python; the
best that you can do is try to write code that does the same thing as
the import system...  but even then your code will be just guessing
(and failing to guess correctly) in the case where a package's
initialization involves altering its __path__ or if .pth files with
dynamic code are involved.

Similarly, for any module foo.bar.baz, foo.bar must be imported in
order to know what path to use for checking for the existence of
foo.bar.baz.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] disabling the serving of links from description_html?

2012-12-18 Thread PJ Eby
On Tue, Dec 18, 2012 at 11:46 AM, M.-A. Lemburg m...@egenix.com wrote:
 AFAIK, setuptools/distribute only looks at links with rel=homepage
 or rel=download attributes, not all links on the PyPI project page.
 The links from the description don't receive such attributes.

Those are the only links that are unconditionally followed, yes.  But
all links it sees are parsed to see if they appear to be a direct
download link (e.g. .tgz, .zip, .egg, #egg= link, etc.).  They're
just not *followed* unless they appear to be a direct link to a
desired version of something, or if it's marked as a homepage or
download link.  All other on-page links are ignored, whether they're
part of the description or otherwise.

(Any given link is also retrieved at most once per run of easy_install.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Flag to tell pip to only install uploaded files

2012-06-22 Thread PJ Eby
On Fri, Jun 22, 2012 at 8:21 PM, Aaron Meurer asmeu...@gmail.com wrote:

 Hi.

 I'm following up on a discussion on the pip mailing list
 (
 https://groups.google.com/forum/#!topic/python-virtualenv/PZNj9pC6aKA/discussion
 ),
 where I was directed here.

 Would it be possible to add some kind of a flag to PyPI that would let
 package maintainers tell pip to install only the uploaded file (or
 possibly also the file given by a direct link), and no others?

 Currently, pip aggressively tries to find the latest version of a
 package by crawling all links on the PyPI page, even those from older
 versions.  This is a headache to me as a package maintainer because it
 means that pip is quite often installing the wrong thing. Recently,
 pip was trying to install our html docs because we had a file uploaded
 at Google Code named sympy-0.7.1-html-docs,


The simple way to correct this problem is to rename the file
'sympy-html-docs-0.7.1' - this will fix things for all installers that
follow easy_install's discovery protocol, including pip and zc.buildout.


 which it deemed to be a
 newer version than sympy-0.7.1.  There's also the issue that every
 time we put out a release candidate for a new version, pip starts
 installing that, when I would prefer it to only install stable final
 releases.  It's also, as I noted on the other discussion list, a bit
 of a security risk.


zc.buildout includes a flag to prefer stable releases, and I believe some
other installation tools do as well.  You might suggest they add such a
flag to pip and move towards using it by default.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] What is the point of pythonpackages.com?

2012-02-07 Thread PJ Eby
On Mon, Feb 6, 2012 at 3:17 PM, Andreas Jung li...@zopyx.com wrote:

 My point about this: if a person does not want
 to host its package on PyPi than it should stay away from PyPI. Package
 hygiene and a certain level of professional package repository is more
 important and personal reasons for not hosting packages on PyPI.


Note that PyPI is also used to publish metadata about packages which are in
development and only available in snapshot releases or revision control
systems.  So the it shouldn't be hosted elsewhere argument doesn't really
wash.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] What is the point of pythonpackages.com?

2012-02-07 Thread PJ Eby
On Tue, Feb 7, 2012 at 11:18 AM, Martijn Faassen faas...@startifact.comwrote:

 On 02/07/2012 07:18 AM, Kai Diefenbach wrote:

 If a listed package is not available (because an external server is
 down) the index is broken.


 That's an interesting observation. I would think 'broken' is strong
 language, but it the index can at least be considered incorrect in that
 particular instance.

 If people have tools that rely on the index being correct, then this it
 being incorrect can be a problem. You can either say those tools shouldn't
 be used for real development work (you're doing it wrong), or encourage
 people to provide the package on PyPI as well (encouragement as a social
 solution), or consider facilities to provide redundancy (caching,
 mirroring) to help with the experience (a technical solution).


Note, too, that prior to setuptools' development, there wasn't even any
expectation that projects listed on PyPI even have a current *release*, or
even have any *source code written* , let alone packages available for
download from PyPI itself.  (PyPI uploading was developed around the same
time as the first versions of setuptools and EasyInstall.)

Just because the common use-case for PyPI nowadays is to pull down
installation files, doesn't mean the previous use cases which PyPI catered
to are gone or not worth supporting any more.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] What is the point of pythonpackages.com?

2012-02-07 Thread PJ Eby
On Tue, Feb 7, 2012 at 12:06 PM, Donald Stufft donald.stu...@gmail.comwrote:

 On Tuesday, February 7, 2012 at 12:02 PM, PJ Eby wrote:

 On Mon, Feb 6, 2012 at 3:17 PM, Andreas Jung li...@zopyx.com wrote:

 My point about this: if a person does not want
 to host its package on PyPi than it should stay away from PyPI. Package
 hygiene and a certain level of professional package repository is more
 important and personal reasons for not hosting packages on PyPI.


 Note that PyPI is also used to publish metadata about packages which are
 in development and only available in snapshot releases or revision control
 systems.  So the it shouldn't be hosted elsewhere argument doesn't really
 wash.'

 This is a matter of opinion really, Personally I think if your package is
 in development you should publish snapshot releases to PyPI.


Yes, but now we get into the wonderful world of how many releases do you
actually want active vs. hidden vs. deleted, and now there are that many
more files to be possible frozen and mirrored and archived and whatnot,
which isn't really suitable for such dev releases.

(Also, in the specific case of my snapshot-only packages, I have automated
builds that keep a rotating set of snapshots in a server-local download
directory for public access; I wouldn't want that build process
automatically uploading that stuff to PyPI, as it adds more moving parts
for things to break on my end.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Distutils sdist formats best practice

2012-02-07 Thread PJ Eby
On Mon, Feb 6, 2012 at 12:19 PM, Alex Clark acl...@aclark.net wrote:

 What do pip/easy_install/etc do when they encounter both a .zip and a
 .tar.gz, for example?


IIRC, easy_install will take the longer filename in preference to the
shorter one, all else being equal; that's its final tiebreaker after what
kind of thing it expects to find at a given URL.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Proposal: close the PyPI file-replacement loophole

2012-02-01 Thread PJ Eby
On Wed, Feb 1, 2012 at 6:06 AM, Yuval Greenfield ubershme...@gmail.comwrote:

 Does the setup.py/cfg allow me to require a specific hash on SQLAlchemy
 when automatically resolving dependencies in pip/easy_install?


Yes, at least for easy_install.  You tack on  #md5= to your find_links
URLs, and specify an exact version.  easy_install will refuse to install
them if the MD5 doesn't match.  (This will work better for source packages
than binaries, of course, since you'd only need to include one link and MD5
signature in that case.)
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig