Re: [Catalog-sig] Merge catalog-sig and distutils-sig

2013-03-28 Thread holger krekel
On Thu, Mar 28, 2013 at 14:22 -0400, Donald Stufft wrote:
 Is there much point in keeping catalog-sig and distutils-sig separate?
 
 It seems to me that most of the same people are on both lists, and the topics 
 almost always have consequences to both sides of the coin. So much so that 
 it's often hard to pick *which* of the two (or both) lists you post too. 
 Further confused by the fact that distutils is hopefully someday going to go 
 away :)

+1

 Not sure if there's some official process for requesting it or not, but I 
 think we should merge the two lists and just make packaging-sig to umbrella 
 the entire packaging topics.
 
 -
 Donald Stufft
 PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
 



 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig

___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Merge catalog-sig and distutils-sig

2013-03-28 Thread holger krekel
On Thu, Mar 28, 2013 at 15:42 -0400, Donald Stufft wrote:
 On Mar 28, 2013, at 3:39 PM, PJ Eby p...@telecommunity.com wrote:
 
  On Thu, Mar 28, 2013 at 3:14 PM, Fred Drake f...@fdrake.net wrote:
  On Thu, Mar 28, 2013 at 2:22 PM, Donald Stufft don...@stufft.io wrote:
  Is there much point in keeping catalog-sig and distutils-sig separate?
  
  No.
  
  The last time this was brought up, there were objections, but I don't
  remember what they were.  I'll let people who think there's a point
  worry about that.
  
  Not sure if there's some official process for requesting it or not, but
  I think we should merge the two lists and just make packaging-sig to
  umbrella the entire packaging topics.
  
  There is the meta-sig, but the description is out-dated:
  
 http://mail.python.org/mailman/listinfo/meta-sig
  
  and the last message in the archives is dated 2011, and sparked no
  discussion:
  
 http://mail.python.org/pipermail/meta-sig/2011-June.txt
  
  +1 on merging the lists.
  
  Can we do it by just dropping catalog-sig and keeping distutils-sig?
  I'm afraid we might lose some important distutils-sig population if
  the process involves renaming the list, resubscribing, etc.  I also
  *really* don't want to invalidate archive links to the distutils-sig
  archive.
  
  All in all, +1 on not having two lists, but I'm really worried about
  breaking distutils-sig.  We're still going to be talking about
  distribution utilities, after all.
 
 Don't care how it's done. I don't know Mailman enough to know what is 
 possible or how easy things are. I thought packaging-sig sounded nice but if 
 you can't rename + redirect or merge or something in mailman I'm down for 
 whatever.

I've moved lists even from external sites to python.org and renamed them
(latest was pytest-dev).  That part works nicely and people can continue
to use the old ML address.  Merging two lists however makes it harder
to get redirects for the old archives.  But why not just keep distutils-sig
and catalog-sig archives, but have all their mail arrive at
a new packaging-sig and begin a new archive for the latter?

holger


 -
 Donald Stufft
 PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
 



 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig

___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Updated PEP 438

2013-03-21 Thread holger krekel
Hi Richard, all,

On Wed, Mar 20, 2013 at 17:30 -0700, Richard Jones wrote:
 I've pushed the latest PEP to the repos. It has all the recent
 clarifications and the API docs. Just need to wait for the website to
 rebuild or something.

It's online now. Current references to PEP438 (also inlined below):

http://www.python.org/dev/peps/pep-0438/

https://bitbucket.org/hpk42/pep-pypi/src/c0cbd3f3508991f5c47eb0fdb036c6e25ef45047/PEP-438.txt?at=default
 
 Unless there's any last-minute problems I'll accept the PEP in this
 form and push the implementation to the production PyPI next week
 after I fly home.

testpypi.python.org keeps 502ing on me - probably makes sense to first have
that stable and reviewed for a few days at least.

best and thanks everybody,

holger


PEP: 438
Title: Transitioning to release-file hosting on PyPI
Version: $Revision$
Last-Modified: $Date$
Author: Holger Krekel hol...@merlinux.eu, Carl Meyer c...@oddbird.net
BDFL-Delegate: Richard Jones rich...@python.org
Discussions-To: catalog-sig@python.org
Status: Draft
Type: Process
Content-Type: text/x-rst
Created: 15-Mar-2013
Post-History:


Abstract


This PEP proposes a backward-compatible two-phase transition process
to speed up, simplify and robustify installing from the
pypi.python.org (PyPI) package index.  To ease the transition and
minimize client-side friction, **no changes to distutils or existing
installation tools are required in order to benefit from the first
transition phase, which will result in faster, more reliable installs
for most existing packages**.

The first transition phase implements easy and explicit means for a
package maintainer to control which release file links are served to
present-day installation tools.  The first phase also includes the
implementation of analysis tools for present-day packages, to support
communication with package maintainers and the automated setting of
default modes for controlling release file links.  The first phase
also will default newly-registered projects on PyPI to only serve
links to release files which were uploaded to PyPI.

The second transition phase concerns end-user installation tools,
which shall default to only install release files that are hosted on
PyPI and tell the user if external release files exist, offering a
choice to automatically use those external files.  External release
files shall in the future be registered together with a checksum
hash so that installation tools can verify the integrity of the
eventual download (PyPI-hosted release files always carry such
a checksum).

Alternative PyPI server implementations should implement the new
simple index serving behaviour of transition phase 1 to avoid
installation tools treating their release links as external ones in
phase 2.


Rationale
=

.. _history:

History and motivations for external hosting


When PyPI went online, it offered release registration but had no
facility to host release files itself.  When hosting was added, no
automated downloading tool existed yet.  When Philip Eby implemented
automated downloading (through setuptools), he made the choice to
allow people to use download hosts of their choice.  The finding of
externally-hosted packages was implemented as follows:

#. The PyPI ``simple/`` index for a package contains all links found
   by scraping them from that package's long_description metadata for
   any release. Links in the Download-URL and Home-page metadata
   fields are given ``rel=download`` and ``rel=homepage`` attributes,
   respectively.

#. Any of these links whose target is a file whose name appears to be
   in the form of an installable source or binary distribution, with
   name in the form packagename-version.ARCHIVEEXT, is considered a
   potential installation candidate by installation tools.

#. Similarly, any links suffixed with an #egg=packagename-version
   fragment are considered an installation candidate.

#. Additionally, the ``rel=homepage`` and ``rel=download`` links are
   crawled by installation tools and, if HTML, are themselves scraped
   for release-file links in the above formats.

See the easy_install documentation for a complete description of this
behavior. [1]_

Today, most packages indexed on PyPI host their release files on
PyPI. Out of 29,117 total projects on PyPI, only 2,581 (less than 10%)
include any links to installable files that are available only
off-PyPI. [2]_

There are many reasons [3]_ why people have chosen external
hosting. To cite just a few:

- release processes and scripts have been developed already and upload
  to external sites

- it takes too long to upload large files from some places in the
  world

- export restrictions e.g. for crypto-related software

- company policies which require offering open source packages through
  own sites

- problems with integrating uploading to PyPI into one's release
  process (because of release policies)

- desiring download

Re: [Catalog-sig] Replacement client for pep381client

2013-03-21 Thread holger krekel
On Wed, Mar 20, 2013 at 19:27 -0700, Christian Theune wrote:
 On 2013-03-20 23:59:21 +, Christian Theune said:
 
 I'm currently re-initializing my own mirror. This basically can be
 run in-place by just removing the existing state data and calling
 my sync script (bsn-mirror) instead of pep381run with the same
 parameters.
 
 This worked nicely for me - I'm running my mirror on bandersnatch now.

I got so far 3 errors like this one::

2013-03-21 14:23:19,759 bandersnatch.package INFO: Downloading: 
https://pypi.python.org/packages/source/C/Clay/Clay-0.13.tar.gz
2013-03-21 14:23:20,384 bandersnatch.package ERROR: Error syncing package: 
Coopr
Traceback (most recent call last):
  File /home/hpk/bandersnatch/src/bandersnatch/package.py, line 50, in 
sync
self.sync_release_files()
  File /home/hpk/bandersnatch/src/bandersnatch/package.py, line 68, in 
sync_release_files
self.download_file(release_file['url'], release_file['md5_digest'])
  File /home/hpk/bandersnatch/src/bandersnatch/package.py, line 144, in 
download_file
url, existing_hash, md5sum))
ValueError: https://pypi.python.org/packages/source/C/Coopr/Coopr-1.1.zip 
has hash 97cb7ae47656df10d243533c4f0c63c1 instead of 
7ed6916702b2afccd254b423450ac4af

and the command terminates.  I can restart fine, though.  Will continue
to do continue and see how far i get.  Seems to perform quickly, btw :)

holger

 Christian
 
 
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


[Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread holger krekel
Hi all, in particular Philip, Marc-Andre, Donald,

Carl and me decided to simplify the PEP and avoid the somewhat
awkward ``simple/-with-externals`` index for various reasons, among them
Marc-Andre's criticisms.  This also means present-day installation tools
(shipped with Redhat/Debian/etc.) will continue to work as today for
those packages which remain in a hosting-mode that requires crawling and
scraping.  They will still benefit from the fact that most packages will
soon have a hosting-mode that avoids it.  Future releases of installation
tools will default to not perform crawling or using (scraped) external
links, and new PYPI projects will default to only serve uploaded files.

The V4 pre-PEP also renames the three PyPI hosting modes to be more
descriptive. Since all three modes allow external links, pypi-ext vs
pypi-only were misleading. The new naming distinguishes the mode that both
scrapes links from metadata and crawls external pages for more links
(pypi-scrape-crawl) from the mode that only scrapes links from metadata
(pypi-scrape) from the mode where all links are explicit (pypi-explicit).

Without the separate external index, it also turns out that the two transition
phases are separated into PyPI changes (phase one) and installer-tool
updates (phase two). There are no PyPI changes necessary in phase two.
As stated in a new open question, it should be possible to do 
PEP-related installation tool updates during phase 1, that may require
a bit of clarification in the PEP's language still.

Carl and me are happy with this PEP version now and hope you all are as
well.  Donald is already working on improving the analysis tool so
we hopefully have some updated numbers soon.

cheers,

Holger


PEP: XXX
Title: Transitioning to release-file hosting on PyPI
Version: $Revision$
Last-Modified: $Date$
Author: Holger Krekel hol...@merlinux.eu, Carl Meyer c...@oddbird.net
Discussions-To: catalog-sig@python.org
Status: Draft (PRE-submit V4)
Type: Process
Content-Type: text/x-rst
Created: 10-Mar-2013
Post-History:


Abstract


This PEP proposes a backward-compatible two-phase transition process
to speed up, simplify and robustify installing from the
pypi.python.org (PyPI) package index.  To ease the transition and
minimize client-side friction, **no changes to distutils or existing
installation tools are required in order to benefit from the first
transition phase, which will result in faster, more reliable installs
for most existing packages**.

The first transition phase implements an easy and explicit means for a
package maintainer to control which release file links are served to
present-day installation tools.  The first phase also includes the
implementation of analysis tools for present-day packages, to support
communication with package maintainers and the automated setting of
default modes for controlling release file links.  The first phase
also will make new projects on PYPI use a default to only serve 
links to release files which were uploaded to PYPI.

The second transition phase concerns end-user installation tools,
which shall default to only install release files that are hosted on
PyPI and tell the user if external release files exist, offering
a choice to automatically use those external files.


Rationale
=

.. _history:

History and motivations for external hosting


When PyPI went online, it offered release registration but had no
facility to host release files itself.  When hosting was added, no
automated downloading tool existed yet.  When Philip Eby implemented
automated downloading (through setuptools), he made the choice to
allow people to use download hosts of their choice.  The finding of
externally-hosted packages was implemented as follows:

#. The PyPI ``simple/`` index for a package contains all links found
   by scraping them from that package's long_description metadata for 
   any release. Links in the Download-URL and Home-page metadata
   fields are given ``rel=download`` and ``rel=homepage`` attributes,
   respectively.

#. Any of these links whose target is a file whose name appears to be
   in the form of an installable source or binary distribution, with
   name in the form packagename-version.ARCHIVEEXT, is considered a
   potential installation candidate by installation tools.

#. Similarly, any links suffixed with an #egg=packagename-version
   fragment are considered an installation candidate.

#. Additionally, the ``rel=homepage`` and ``rel=download`` links are
   crawled by installation tools and, if HTML, are themselves scraped
   for release-file links in the above formats.

Today, most packages released on PyPI host their release files on
PyPI, but a small percentage (XXX need updated data) rely on external
hosting.

There are many reasons [2]_ why people have chosen external
hosting. To cite just a few:

- release processes and scripts have been developed already and upload
  to external sites

Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread holger krekel
On Fri, Mar 15, 2013 at 11:15 -0400, PJ Eby wrote:
 Do we even need the internal/external rel info?  I was planning to
 just use the URL hostname.
 
 i.e., are there any use cases for designating an externally-hosted
 file internal, or an internally-hosted file external?  If not, it
 seems the rel= is redundant.
 
 It's also more work to implement, vs. just defaulting --allow-hosts to
 be the --index-url host; a strategy ISTM pip could also use, since it
 has the same two options available.
 
 Also, if we're not doing homepage/download crawling any more, I was
 hoping we could just drop the code that 'parses' rel= links in the
 first place, as it's an awkward ugly hack.  ;-)

We wanted to avoid requiring hostname-checking especially in light of
parallel developments putting PYPI release files on a CDN, i.e.  non
pypi.python.org domains.  The rel=internal communicates that this link
is under control of the index server and the installer should not be
worried and users need not know about allow-hosts etc.  For example,
Donald's https://crate.io is already operating in this manner and has
its files on crate-cdn.com.

best,
holger


___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V4 Pre-PEP: transition to release-file hosting on PYPI

2013-03-15 Thread holger krekel
On Fri, Mar 15, 2013 at 22:01 -0400, PJ Eby wrote:
 On Fri, Mar 15, 2013 at 7:16 PM, Carl Meyer c...@oddbird.net wrote:
  Ok, pending agreement from Holger I'll make a change in the PEP to
  explicitly allow clients to make decisions based on either the rel
  attributes or based on hostnames. Would that be sufficient to address
  your concerns?
 
 Yes.  I just don't want to be in a situation down the road where
 there's another argument about this on Catalog-SIG when PyPI starts
 using a CDN that, but it says this in the rel and you're supposed to
 use that, and I say, but Carl and Holger said...  and they go,
 doesn't matter, PEP says   ;-)
 
 This way, the PEP will be clear that supporting a split of PyPI's
 hostnames isn't in current scope.

 
 I am also okay with the PEP allowing *.indexhost instead of just
 indexhost as the filtering mechanism, as long as it specifies one
 *now*.  (Again, so this doesn't have to be revisited later.)  If
 somebody who knows something about CDNs, TUF, etc., needs to weigh in
 on it first, that's fine.  I just want to know where things stand.
 
One related question.  The rel=internal links will contain
a (md5 currently) hash so if the referenced resource resolves to
a file matching that hash, we can be sure about its integrity.
What kind of security does host-checking add on top?

holger

  Putting the /simple/ API on a CDN isn't quite that easy because it
  currently involves some server-side redirects to effectively make
  project names case-insensitive.
 
 FWIW, easy_install works fine without this.  If a matching index page
 isn't found, it checks the full package list.  PyPI's redirection just
 reduces bandwidth usage and request overhead in the case where the
 case of the user's request doesn't match the actual package listing.
 But it could be completely static without affecting easy_install and
 tools that use its package-finding code.
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files

2013-03-14 Thread holger krekel
On Wed, Mar 13, 2013 at 23:43 -0700, Nick Coghlan wrote:
 On Wed, Mar 13, 2013 at 5:16 PM, Carl Meyer c...@oddbird.net wrote:
  There is no instead of. There are parallel proposals (see the TUF
  thread) to improve the security of the ecosystem, and those proposals
  are not mutually exclusive with this one. If you search the PEP text,
  note that you don't find the words secure or security anywhere
  within it, or any claims of security achieved by this proposal alone.
  There is a brief mention of MITM attacks, which is relevant to the PEP
  because avoiding external link-crawling does reduce that attack surface,
  even if other proposals will also help with that (even more).
 
 Right, the changes to provide end-to-end security require more
 extensive changes and need to be given appropriate consideration
 before we proceed to implementation and deployment. This PEP,
 especially with the additional changes you propose here is an
 excellent approach to *near term* improvement, as a parallel effort to
 the more complex proposals.
 
 The /simple/ index will also be around for a long time for backwards
 compatibility reasons, regardless of any other changes that happen in
 the overall distribution ecosystem.

I haven't followed the latest TUF discussions and related docs in
depths yet but if those developments will regard simple/ as a deprecated
interface, i think this PEP here should maybe not introduce
simple/-with-externals as it will just make the situation more 
complicated for everyone to understand in a few months from now.

best,
holger


 Cheers,
 Nick.
 
 -- 
 Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


[Catalog-sig] V3 PEP-draft for transitioning to pypi-hosting of release files

2013-03-13 Thread holger krekel
Hi all,

after some more discussions and hours spend by Carl Meyer (who is now
co-authoring the PEP) and me, here is a new V3 pre-submit draft.  
It is now more ambitious than the previous draft as should be obvious
from the modified abstract (and Carl Meyers and Philip's earlier
interactions on this list).  There also are more details of how
the current link-scraping works among other improvements and incorporations
of feedback from discussions here.

We intend to submit this draft tonight to the PEP editors.  

Feedback now and later remains welcome.  I am sure there are issues to 
be sorted and clarified, among them the versioning-API suggestion by 
Marc-Andre.

Thanks for everybody's support and feedback so far,
holger


PEP: XXX
Title: Transitioning to release-file hosting on PyPI
Version: $Revision$
Last-Modified: $Date$
Author: Holger Krekel hol...@merlinux.eu, Carl Meyer c...@oddbird.net
Discussions-To: catalog-sig@python.org
Status: Draft (PRE-submit V3)
Type: Process
Content-Type: text/x-rst
Created: 10-Mar-2013
Post-History:


Abstract


This PEP proposes a backward-compatible two-phase transition process to speed
up, simplify and robustify installing from the pypi.python.org (PyPI)
package index.  To ease the transition and minimize client-side
friction, **no changes to distutils or existing installation tools are
required in order to benefit from the transition phases, which is to
result in faster, more reliable installs for most existing packages**.

The first transition phase implements easy and explicit means for
a package maintainter to control which release file links are 
served to present-day installation tools.  The first phase also
includes the implementation of analysis tools for present-day packages,
to support communication with package maintainers and the automated
setting of default modes for controling release file links.   

The second transition phase will result in the current PYPI index 
to only serve PYPI-hosted files by default.  Externally hosted files
will still be automatically discoverable through a second index. 
Present-day installation tools will be able to continue working
by specifying this second index.  New versions of installation
tools shall default to only install packages from PYPI unless
the user explicitely wishes to include non-PYPI sites.



Rationale
=

.. _history:

History and motivations for external hosting


When PyPI went online, it offered release registration but had no
facility to host release files itself.  When hosting was added, no
automated downloading tool existed yet.  When Philip Eby implemented
automated downloading (through setuptools), he made the choice to
allow people to use download hosts of their choice.  The finding of
externally-hosted packages was implemented as follows:

#. The PyPI ``simple/`` index for a package contains all links found
   anywhere in that package's metadata for any release. Links in the
   Download-URL and Home-page metadata fields are given
   ``rel=download`` and ``rel=homepage`` attributes, respectively.

#. Any of these links whose target is a file whose name appears to be
   in the form of an installable source or binary distribution, with
   basename in the form packagename-version.ARCHIVEEXT, is considered 
   a potential installation candidate.

#. Similarly, any links suffixed with an #egg=packagename-version
   fragment are considered an installation candidate.

#. Additionally, the ``rel=homepage`` and ``rel=download`` links are
   followed and, if HTML, are themselves scraped for release-file links
   in the above formats.

Today, most packages released on PyPI host their release files on
PyPI, but a small percentage (XXX need updated data) rely on external
hosting.

There are many reasons [2]_ why people have chosen external
hosting. To cite just a few:

- release processes and scripts have been developed already and upload
  to external sites

- it takes too long to upload large files from some places in the
  world

- export restrictions e.g. for crypto-related software

- company policies which require offering open source packages
  through own sites

- problems with integrating uploading to PYPI into one's release
  process (because of release policies)

- desiring download statistics different from those maintained by PyPI

- perceived bad reliability of PYPI

- not aware that PyPI offers file-hosting

Irrespective of the present-day validity of these reasons, there
clearly is a history why people choose to host files externally and it
even was for some time the only way you could do things.


Problem
---

**Today, python package installers (pip, easy_install, buildout, and
others) often need to query many non-PyPI URLs even if there are no
externally hosted files**.  Apart from querying pypi.python.org's
simple index pages, also all homepages and download pages ever
specified with any release of a package are crawled

Re: [Catalog-sig] A 90% Solution

2013-03-12 Thread holger krekel
On Mon, Mar 11, 2013 at 19:04 -0400, PJ Eby wrote:
 Just a thought, but...
 
 If 90% of PyPI projects do not have any external files to download,
 then, wouldn't it make sense to:

sidenote: we need to verify and clarify the 90/10 ratio.  It would be 
the basis for action/changing pypi-state so we need to have this accurate
and double-checked.

 1. Add a project-level option to enable or disable the adding of the
 rel= attribute to /simple links (but not affecting the links in any
 other way)
 2. Default it to disabled for new projects, and
 3. Set it to disabled *now* for the 90% of projects that *don't have
 external files*?

 If the arguments about banning external links are as valid and
 important as some people claim, wouldn't it make sense to do this part
 *now*, without first requiring a commitment to force the switch to a
 disabled state in the future?

Pre-announcing the step to maintainers is good communication style. 
There is always the issue of bugs in your determination of external hosting
or tools that rely on rel attributes without us knowing etc.  

 Immediately, 90% of the problem goes away - no random spidering of
 stuff that doesn't contain a link now, but which could be taken over
 by a malicious party in the future, and 90% fewer sites having to be
 up in order for you to build something from PyPI.
 
 Seems like a serious win to me -- and one that might not even need a PEP.

Yes and no: a PEP-like document is a good place to point people to.

 Next steps after this would be providing tools to help people move
 their files and links, promoting that people switch it off if they no
 longer support the offsite links, educating about security concerns,
 etc.

 I really don't understand why the 90% solution isn't *already* the
 consensus position, since it doesn't preclude follow-on efforts
 towards reducing the 10% towards 0%.

 And if the problem is so important, why must we keep 90% of the
 problems in place, just so we can keep arguing about censoring the
 10%?  That doesn't make sense to me.

The idea for only changing the pypi-server side only evolved last week -
so we are not that slow in moving on here :)

cheers,
holger


 
 To me, if somebody's injured, the first thing you do is clean and
 close the wound, not argue about whether it's a complete solution and
 what might happen days or weeks later.
 
 Just a thought.
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


[Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI

2013-03-12 Thread holger krekel
,
performing a Man-in-The-Middle (MITM) attack between an installation
site and any of the download sites can inject mailicious packages on the
installation site.  As many homepages and download locations are using
HTTP and not proper HTTPS, such attacks are not very hard to launch.
Such MITM attacks can happen even for packages which never intended to
host files externally as their homepages are contacted by installers
anyway.

There is currently no way for package maintainers to avoid 3rd party
crawling, other than removing all homepage/download url metadata
for all historic releases.  While a script [3]_ has been written to 
perform this action, it is not a good general solution because it removes
semantic information like the homepage specification from PYPI packages.


Solution
---

The proposed solution consists of the following implementation and
communication steps:

- determine which packages have releases files only on PYPI (group A)
  and which have externally hosted release files (group B).

- Prepare PYPI implementation to allow a per-project hosting mode,
  effectively enabling or disabling external crawling.  When enabled 
  nothing changes from the current situation of producing ``rel=download`` 
  and ``rel=homepage`` attributed links on ``simple/`` pages, 
  causing installers to crawl those sites.  
  When disabled, the attributions of links will change 
  to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to
  avoid crawling 3rd party sites.  Retaining the meta-information allows
  tools to still make use of the semantic information.

- send mail to maintainers of A that their project is going to be 
  automatically configured to disable crawling in one week
  and encourage them to set this mode earlier to help all of 
  their users.

- send mail to maintainers of B that their package hosting mode 
  is crawling enabled, and list the sites which currently are crawled,
  and suggest that they re-host their packages directly on PYPI and 
  then switch the hosting-mode disable crawling.  Provide instructions 
  and at best tools to help with this re-uploading process.

In addition, maintainers of installation tools are asked to release
two updates.  The first one shall provide clear warnings if external
crawling needs to happen, for which projects and URLS exactly 
this happens, and that in the future crawling will be disabled by default.  
The next update shall change the default to disallow crawling and allow 
crawling only with an explicit option like ``--crawl-externals`` and 
another option allowing to limit which hosts are allowed to be crawled
at all.


Hosting-Mode state transitions
--

1. At the outset, we set hosting-mode to notset for all packages.
   This will not change any link served via the simple index and thus
   no bad effects are expected.  Early adopters and testers may now
   change the mode to either crawl or nocrawl to help with
   streamlining issues in the PYPI implementation.

2. When maintainers of B packages are mailed their mode is directly
   set to crawl.

3. When maintainers of A are mailed we leave the mode at notset to allow
   people to change it to nocrawl themselves or to set it to crawl 
   if they think they are wrongly in the A group.  After a week 
   all notset modes are set to nocrawl.

A week after the mailings all packages will be in crawl or nocrawl
hosting mode.  It is then a matter of good tools and reaching out to
maintainers of B packages to increase the A/B ratio.

Open questions
--

- Should the support tools for rehosting packages be implemented  on the
  server side or on the client side?  Implementing it on the client
  side probably is quicker to get right and less fatal in terms of failures.

- double-check if ``rel=newhomepage`` and ``rel=newdownload`` cause the 
  desired behaviour of pip and easy_install (both the distribute and 
  setuptools based one) to not crawl those pages.

- are the support tools for re-hosting outside the scope of this PEP?

- Think some more about pip/easy_install allow-hosts mode etc.

References


.. [1] Donald Stufft, ratio of externally hosted versus pypi-hosted, 
http://mail.python.org/pipermail/catalog-sig/2013-March/005549.html

.. [2] Marc-Andre Lemburg, reasons for external hosting, 
http://mail.python.org/pipermail/catalog-sig/2013-March/005626.html

.. [3] Holger Krekel, Script to remove homepage/download metadata for
   all releases 
http://mail.python.org/pipermail/catalog-sig/2013-February/005423.html

Acknowledgments
--

Philip Eby for precise information and the basic ideas to
implement the transition via server-side changes only.

Donald Stufft for pushing away from external hosting and doing
the 90/10 % statistics script and offering to implement a PR.

Marc-Andre Lemburg, Nick Coghlan and catalog-sig for thinking
through issues regarding getting rid of external hosting.


Copyright

Re: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI

2013-03-12 Thread holger krekel
On Wed, Mar 13, 2013 at 01:19 +1000, Nick Coghlan wrote:
 That looks pretty good to me. My only comment is that qualifiers like new
 don't age well in an API. The explicit nocrawlhomepage and
 nocrawldownload might be a better choice.

Right, we might also consider dropping rel-attributing given that
you can indeed access release metadata via the xmlrpc or json API.

best,
holger

 Cheers,
 Nick.
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI

2013-03-12 Thread holger krekel
On Tue, Mar 12, 2013 at 11:53 -0400, PJ Eby wrote:
 On Tue, Mar 12, 2013 at 7:38 AM, holger krekel hol...@merlinux.eu wrote:
  In addition, maintainers of installation tools are asked to release
  two updates.  The first one shall provide clear warnings if external
  crawling needs to happen,
 
 A clarification here: needs to happen is not well-specified.  An
 installer tasked with finding the latest or best-matching version of a
 package must currently *always* crawl.  So the warning would be
 always.

Not after the initial automatic PYPI transition. For the 90% of the 
packages you wouldn't see the warning then.

 The strategy I originally chose for making this change in easy_install
 is to warn once at the beginning that --allow-hosts has not been set,
 and thus packages might be downloaded from anywhere on the internet.

From a UI perspective i'd like to see a summary of actually consulted but
non-specified websites (including if it was http or https) at the 
very end of an installers output.  With non-specified i mean sites
that weren't specified as an indexserver or allow-host.

 I've since become uncertain that this change is actually workable in
 the short term, since until most of the packages are actually moved
 onto PyPI, a lot of installs will fail if somebody changes their
 configuration to be more secure.  So I'm thinking the warning needs to
 be deferred until at least the more popular packages have moved to
 PyPI.

I think it's fine to wait until after the initial hosting-mode transition.

  Now, if there is some agreement, i can submit this PEP officially tomorrow,
  and given agreement/refinments from the Pycon folks and the likes of
  Richard, we may be able to get going very shortly after Pycon.
 
 I'd like to suggest that the PEP should be explicit that no other
 changes to the /simple generation algorithm are being made, just the
 removal or alteration of rel= attributes.  i.e., it will still be
 possible -- at least in the near term -- for projects to include
 explicit download links to files made available elsewhere.  Changing
 that situation is more controversial and will require wider community
 participation than has occurred to date.

I kind of agree.  To transition forward , we should leave out the
question of further modifying the simple/ pages at the moment.
Mentioning that this means you can put http://PKGNAME-VER.tar.gz; in
your PKGNAME long_description or download_url metadata makes sense.
For that, the installers will give warnings, however, and eventually 
change defaults according to the PEP draft.

 It might also be good to suggest that authors of PyPI clones plan
 their own phase-out of rel= attributes.

Most alternative servers i've seen don't use the rel attribution
but it's good to mention it.

best,
holger

___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI

2013-03-12 Thread holger krekel
Hi Marc-Andre, all,

On Tue, Mar 12, 2013 at 17:06 +0100, M.-A. Lemburg wrote:
 On 12.03.2013 12:38, holger krekel wrote:
  Hi all,
  
  below is the new PEP pre-submit version (V2) which incorporates the
  latest suggestions and aims at a rapidly deployable solution.  Thanks in
  particular to Philip, Donald and Marc-Andre.  I also added a few notes
  on how installers should behave with respect to non-PYPI crawling.  
  
  I think a PEP like doc is warranted and that we should not silently
  change things without proper communication to maintainers and pre-planning
  the implementation/change process.  Arguably, the changes are more
  invasive than oh, let's just do a http-https redirect which didn't
  work too well either.
  
  Now, if there is some agreement, i can submit this PEP officially tomorrow,
  and given agreement/refinments from the Pycon folks and the likes of
  Richard, we may be able to get going very shortly after Pycon.
  
  cheers,
  holger
  
  
  PEP-draft: transitioning to release-file hosting on PYPI
  
  
  Status
  ---
  
  PRE-SUBMIT-v2
  
  Abstract
  
  
  This PEP proposes a backward-compatible transition process to speed up,
  simplify and robustify installing from the pypi.python.org (PYPI)
  package index.  The initial transition will put most packages on PYPI
  automatically in a configuration mode which will prevent client-side
  crawling from installers.  To ease automatic transition and minimize
  client-side friction, **no changes to distutils or installation tools** are
  required.  Instead, the transition is implemented by modifying PYPI to
  serve links from ``simple/`` pages in a configurable way, preventing or
  allowing crawling of non-PYPI sites for detecting release files.
  Maintainers of all PYPI packages will be notified ahead of those
  changes.
  
  Maintainers of packages which currently are hosted on non-PYPI sites
  shall receive instructions and tools to ease re-hosting of their
  historic and future package release files.  The implementation of such
  tools is NOT required for implementing the initial automatic transition.
  
  Installation tools like pip and easy_install shall warn about crawling
  non-PYPI sites and later default to disallow it and only allow it with
  an explicit option.
  
  
  History and motivations for external hosting
  
  
  When PYPI went online, it offered release registration but had no
  facility to host release files itself.  When hosting was added, no
  automated downloading tool existed yet.  When Philip Eby implemented
  automated downloading (through setuptools), he made the choice 
  to allow people to use download hosts of their choice.  This was
  implemented by the PYPI ``simple/`` index containing links of type
  ``rel=homepage`` or ``rel=download`` which are crawled by installation
  tools to discover package links.  As of March 2013, a substantial part 
  of packages (estimated to about 10%) make use of this mechanism to host
  files on github, bitbucket, sourceforge or own hosting sites like 
  ``mercurial.selenic.com``, to just name a few.
  
  There are many reasons [2]_ why people choose to use external hosting,
  to cite just a few:
  
  - release processes and scripts have been developed already and 
upload to external sites 
  
  - it takes too long to upload large files from some places in the world
  
  - export restrictions e.g. for crypto-related software
  
  - company policies which prescribe offering open source packages through
own sites
  
  - problems with integrating uploading to PYPI into one's release process
(because of release policies)
  
  - perceived bad reliability of PYPI
  
  - missing knowlege you can upload files 
  
  Irrespective of the present-day validity of these reasons, there clearly
  is a history why people choose to host files externally and it even was 
  for some time the only way you could do things.  
  
  
  Problem
  ---
  
  **Today, python package installers (pip and easy_install) often need to
  query non-PYPI sites even if there are no externally hosted files**.
  Apart from querying pypi.python.org's simple index pages, also all
  homepages and download pages ever specified with any release of a
  package are crawled by an installer.  The need for installers to
  crawl 3rd party sites slows down installation and makes for a brittle
  unreliable installation process.   Those sites and packages also don't 
  take part in the :pep:`381` mirroring infrastructure, further decreasing
  reliability and speed of automated installation processes around the world. 
  
  Roughly 90% of packages are hosted directly on pypi.python.org [1]_.
  Even for them installers still need to crawl the homepage(s) of a
  package.  Many package uploaders are particularly not aware that
  specifying the homepage in their release

Re: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI

2013-03-12 Thread holger krekel
Hi Carl,

On Tue, Mar 12, 2013 at 10:48 -0600, Carl Meyer wrote:
 Hi Holger,
 
 I am confused about the discrepancy between the title of this pre-PEP
 (transition to release file hosting on PyPI) and the contents of the
 PEP, which describe a transition to not crawling _HTML pages_ on
 external sites looking for distribution download links. These are not
 the same thing at all.

I agree the title is not quite right at the moment.

 Current installer tools will only crawl external HTML pages if they are
 rel=download or rel=homepage, but they will use any link they find
 in the simple index (regardless of rel attr) if the target of the link
 appears to be a distribution file (as determined by filename
 pattern-matching or #egg fragment).

Right.

 At the end of the process you describe, if all packages migrate to
 nocrawl, the rel-link HTML spidering will no longer happen. This is a
 good first step: it will speed up installation somewhat, and reduce the
 frustration of some package owners when installers find files linked
 from their project homepage that they never intended for automated
 installation. But installers will still find and download release
 packages that are not hosted on PyPI, if those package files are linked
 directly in the simple index. This is still surprising behavior to many
 new Python users, and still carries the security and reliability
 concerns that this PEP claims to address.

Yes, and here the installers should move to give clear warnings
and change defaults.

 I'm honestly not sure whether the title or the content more accurately
 reflects the intent of this PEP; depending which it is, I suggest one of
 the following:
 
 1) Add to the PEP a description of a further step in the migration
 process, which actually does transition away from automated installation
 of non-PyPI-hosted release files (as the default behavior of
 installation tools); or

This makes sense to me.  Do you feel like opening a pull request on

https://bitbucket.org/hpk42/pep-pypi

to help refine this aspect?  I am also on IRC for co-ordination (also
about the title) as i intend to create the PEP submission for
python-ideas and maybe already the pep-editors (?!).  In any case, it
wouldn't mean the PEP's discussion is finalized, of course, and i'd
continue to post here new versions and ask for feedback.

cheers,
holger

 2) Change the title of the PEP to something like Transitioning away
 from non-PyPI HTML crawling and add a paragraph to the PEP clarifying
 that this PEP does not address the issue of actual release files hosted
 off-PyPI.


 Carl
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-12 Thread holger krekel
On Tue, Mar 12, 2013 at 13:18 -0400, PJ Eby wrote:
 On Tue, Mar 12, 2013 at 12:29 PM, Jacob Kaplan-Moss ja...@jacobian.org 
 wrote:
  On Tue, Mar 12, 2013 at 11:19 AM, M.-A. Lemburg m...@egenix.com wrote:
  So let's do this carefully and find a good solution before
  jumping to conclusions.
 
  Completely agreed; rushing is a bad idea.
 
  But so is not starting. What I'm seeing — as a total outsider, a user
  of these tools, not someone who creates them — is that a bunch of
  people (Holger, Donald, Richard, the pip maintainers, etc.) have the
  beginnings of a solution ready to go *right now*, and I want to
  capture that energy and enthusiasm before it evaporates.
 
  This isn't an academic situation; I've seen companies decline to adopt
  Python over this exact security issue.
 
 Nobody told them about how to configure a restricted, site-wide
 default --allow-hosts setting?   (
 http://peak.telecommunity.com/DevCenter/EasyInstall#restricting-downloads-with-allow-hosts
 and 
 http://docs.python.org/2/install/index.html#location-and-names-of-config-files
 )
 
 (FWIW, --allow-hosts was added in setuptools 0.6a6 -- *years* before
 the distribute fork or the existence of pip, and pip offers the same
 option.)
 
 I've already agreed to change setuptools to default this option to
 only allow downloads from the same host as its index URL, in a future
 release.  (i.e. to default --allow-hosts to the host of the
 --index-url option), and I support the removing of rel= spidering
 from PyPI (which will significantly mitigate the immediate speed and
 security issues).  Heck, I've been the one who'se repeatedly proposed
 various ways of cutting back or removing rel= attributes from the
 /simple index.
 
 The result of these two changes will actually have the same net effect
 that people are being asking for here: you'll only be able to download
 stuff hosted on PyPI, unless you explicitly override the --allow-hosts
 to get a wider range of packages.
 
 Already today, when a URL is blocked by --allow-hosts, it's announced
 as part of easy_install's output, so you can see exactly how much
 wider you need to extend your trust for the download to succeed.
 
 The *only* thing I object to is removing the ability for people to
 *choose* their own levels of trust.
 
 And I have not yet seen an argument that justifies removing people's
 ability to *choose* to be more inclusive in their downloads.
 
 And I've put multiple compromise proposals out there to begin
 mitigating the problem *now* (i.e. for non-updated versions of
 setuptools), and every time, the objection is, no, we need to ban it
 all now, no discussion, no re-evaluation, no personal choice, everyone
 must do as we say, no argument.

FWIW, the PEP draft in V2 doesn't take this approach and i don't
plan to introduce it in subsequent versions. IOW, i agree that
we should keep things backward-compatible in the sense that users
can choose to use non-default settings to get the current behaviour 
(which might make their installation process less reliable/secure, 
but that's their choice).

cheers,
holger
 
 And I don't understand that, at all.


 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-12 Thread holger krekel
On Tue, Mar 12, 2013 at 12:18 -0600, Carl Meyer wrote:
 It seems to me that there's a remarkable level of consensus developing
 here (though it may not look like it), and a small set of remaining open
 questions.
 
 The consensus (as I see it):
 
 - Migrate away from scraping external HTML pages, with package owners in
 control of the migration but a deadline for a forced switch, as outlined
 in Holger's PEP (with all appropriate caution and testing).
 
 - In some way, migrate to a situation where the popular installer tools
 install only release files from PyPI by default, but are capable of
 installing from other locations if the user provides an option.
 
 The open question is basically how to implement the latter portion. I
 see two options proposed:
 
 A) Leave external links in the PyPI simple index, but migrate the major
 tools to not use external links by default (i.e. Philip's plan to make
 allow-hosts=pypi the default in a future setuptools), with an option to
 turn them back on.
 
 or
 
 B) Do a second PyPI migration, again with a per-package toggle and
 package owners in control, to a no external links in simple index setting.
 
 Consider for a moment how similar the end state here is with either A or
 B. In either case, by default users install only from PyPI, but by
 providing a special option they can install from some external source.
 (In B, that special option would be something like --find-links with a
 URL). In either case, we can continue to allow packages to register
 themselves on PyPI, be found in searches, etc, without uploading release
 files to PyPI if they prefer not to; they'll just have to provide
 special installation instructions to their users in that case.
 
 Here are some differences:
 
 1) With B, we can provide a gentler migration for package owners, where
 they are in control of when the switch happens. With A, regardless of
 how it's done at some point some package owners are likely to start
 getting hey, i can't install your stuff anymore reports from users,
 and they can't control when that starts happening.
 
 2) With B, all end users benefit from the new defaults, not only end
 users who update to the latest and greatest tools.
 
 3) With B (and probably some forms of A as well), end users clearly
 state which external sources they would like to trust and install from,
 rather than having a global trust everything! flag, which is less
 secure and less sensible.
 
 It seems to me that option B (a controlled, per-package, PyPI migration
 to no-external-links in simple index) is a better migration path than A
 (leaving it up to external tools), and the end result either way is very
 similar.

Thanks for outlining this so well.  I agree with the B approach and
suggest to introduce three per-package hosting-states then:

- pypi-only: only pypi-hosted files and all #egg links are served via simple/
  (#egg links are necccessary and a special case for installing
  development snapshots - we should not leave them out i think)

- pypi-nocrawl: all links as of know but without rel-attribution (i.e.
  all description links are served and also the homepage/download ones but
  without rel-attribution)
   
- pypi-crawl: all links as of know

The automatic transition of the hosting-mode for most packages (with
pre-announcements) specified in the PEP will need to differentiate
between switching to pypi-only and pypi-nocrawl.  

And as discussed elsewhere, the implementation of the underlying
analysis script and the PYPI changes certainly needs to be ready 
before the PEP can be finally accepted.

Am open to an according PR to the PEP-draft :)

holger


 
 Carl
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] V2 pre-PEP: transitioning to release file hosting on PYPI

2013-03-12 Thread holger krekel
On Tue, Mar 12, 2013 at 19:07 +0100, M.-A. Lemburg wrote:
 Just a quick note (more later, if time permits)...
 
 On 12.03.2013 18:05, holger krekel wrote:
  Hi Marc-Andre, all,
  
  - Prepare PYPI implementation to allow a per-project hosting mode,
effectively enabling or disabling external crawling.  When enabled 
nothing changes from the current situation of producing 
  ``rel=download`` 
and ``rel=homepage`` attributed links on ``simple/`` pages, 
causing installers to crawl those sites.  
When disabled, the attributions of links will change 
to ``rel=newdownload`` and ``rel=newhomepage`` causing installers to
avoid crawling 3rd party sites.  Retaining the meta-information allows
tools to still make use of the semantic information.
 
  Please start using versioned APIs for these things. The
  old style index should still be available under some
  URL, e.g. /simple-v1/ or /v1/simple/ or /1/simple/
  
  Not sure it is neccessary in this case.  I would think it makes
  the implementation harder and it would probably break PEP381 (mirroring
  infrastructure) as well.
 
 Here's what I meant:
 
 We publish the current implementation of the /simple/ index API
 under a new URL /simple-v1/, so that people that want to use
 the old API can continue to do so.
 
 Then we setup a new /simple-v2/ index API with your proposed
 change, perhaps even dropping the rel attribute altogether.
 
 During testing, we'd then have:
 
 /simple/- same as /simple-v1/
 /simple-v1/ - old API with rel attributes always set
 /simple-v2/ - new API with your changes (rel attributes only
   set in some cases)
 
 After a month or so of testing, we then switch this to:
 
 /simple/- same as /simple-v2/
 /simple-v1/ - old API with rel attributes always set
 /simple-v2/ - new API with your changes (rel attributes only
   set in some cases)

I understand but am not sure how easy this is to manage at the moment.
I'd like to put this up in open questions and have (eventually) Richard 
comment on this before evolving it further.

best,
holger

 -- 
 Marc-Andre Lemburg
 eGenix.com
 
 Professional Python Services directly from the Source  (#1, Mar 12 2013)
  Python Projects, Consulting and Support ...   http://www.egenix.com/
  mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
  mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/
 
 
 : Try our mxODBC.Connect Python Database Interface for free ! ::
 
eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-12 Thread holger krekel
On Tue, Mar 12, 2013 at 14:36 -0500, Jacob Kaplan-Moss wrote:
 On Tue, Mar 12, 2013 at 2:21 PM, PJ Eby p...@telecommunity.com wrote:
  The *only* thing I object to is the part where some people want to ban
  external links from /simple, always and forever, regardless of the
  package authors' choice in the matter.
 
 Here's the thing though, there are already a bunch of other ways users
 can install packages from external repositories. I can think of at
 least two:
 
 * I can pip/easy_install a given URL (e.g. easy_install
 https://www.djangoproject.com/download/1.5/tarball/)
 * I can use a custom index server (pip install -i http://localserver/ django)
 
 The important part is that in each of those cases I can see clearly
 where I'm getting things from.
 
 OTOH, if I do pip install Django I — the person making the install —
 have no control over where that package comes from. It really violates
 people's expectations that this reaches out to somewhere that's
 not-pypi. More importantly it prevents me from making a security
 choice -- I literally don't know until the download starts where the
 file might be coming from.
 
 From where I stand the absolutely non-negotiable part is that
 `pip/easy_install/whatever package` should NEVER access an external
 host (after some suitable transition period). This needs to include
 older installer software, and it needs to make it hard for new tools
 to do the wrong thing. How this is achieved really doesn't matter to
 me -- if there's a pip install --insecure Django that's fine too --
 but to me it's non-negotiable that the out-of-the-box configuration
 not allow external hosts.
 
 Yes, this means taking some options away from the package creator. It
 means that when I'm wearing my author-of-Django hat I can't choose to
 list Django on PyPI but provide the download elsewhere. That's not
 perfect, but given a creator choice vs out of the box security
 choice the latter has to win. [And as a package creator I still have
 options: I can run my own package server, fairly easy to do these
 days.]
 
 Again, the *how* isn't a big deal to me, but the result is really
 important: the tooling has to be secure-by-default, and that means
 (among other things) `pip install package` can never hit something
 that's not PyPI without me explicitly asking for it.

Let's be clear, however, that we are at most reducing attack vectors,
there are substantial attack vectors left.  Nobody should be lead to
think that PYPI is a trusted or reviewed source of software even 
if we got rid of external hosting completely.

holger

 Jacob
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-12 Thread holger krekel
On Tue, Mar 12, 2013 at 15:21 -0400, PJ Eby wrote:
 On Tue, Mar 12, 2013 at 2:18 PM, Carl Meyer c...@oddbird.net wrote:
  It seems to me that there's a remarkable level of consensus developing
  here (though it may not look like it), and a small set of remaining open
  questions.
 
  The consensus (as I see it):
 
  - Migrate away from scraping external HTML pages, with package owners in
  control of the migration but a deadline for a forced switch, as outlined
  in Holger's PEP (with all appropriate caution and testing).
 
  - In some way, migrate to a situation where the popular installer tools
  install only release files from PyPI by default, but are capable of
  installing from other locations if the user provides an option.
 
 Perhaps I'm confused, but ISTM that every time I've said this, Donald
 and Lennart argue that it should not be possible to provide such an
 option -- or to be more specific, that PyPI should not publish the
 information that makes that option possible.
 
 If that's *not* the position they're taking, it'd be good to know,
 because we could totally stop arguing about it in that case.

I don't know.  At least the pre-PEP doesn't take the position
to unconditionally ban external links.  Maybe Lennart or Donald can they
whether they oppose the moves outlined in the PEP.  I'd hope
that the perceived perfect doesn't become the enemy 
of the good here :)

  A) Leave external links in the PyPI simple index, but migrate the major
  tools to not use external links by default (i.e. Philip's plan to make
  allow-hosts=pypi the default in a future setuptools), with an option to
  turn them back on.
 
 I don't know who has proposed this option, but it's not me.  You seem
 to be confusing external links and HTML-scraped links (rel=
 attributed links in /simple).

The suggested behaviour of installers is not fully formulated yet in
the PEP.  We should improve that.

 I was the first person to propose disabling HTML-scraped links from
 PyPI *ASAP*.

Yes, and thanks for pushing us in this direction. 

 I still want them gone.  That won't require tool
 changes, it just requires a rollout plan.  Holger has one, let's work
 on that.

 The second thing I proposed is that new tools be developed to *assist*
 package authors in moving their files onto PyPI, so that future tool
 changes wouldn't result in widespread instances of people needing to
 set their tools to insecure settings just to get anything done.  We
 need to get people's files moving onto PyPI *first*, in order to make
 changing the tool defaults practical.

Indeed, it's a good idea to require the re-hosting or transfer tool ready
before installers change their defaults.

 The *only* thing I object to is the part where some people want to ban
 external links from /simple, always and forever, regardless of the
 package authors' choice in the matter.

I agree the package author should have a choice about the serving of links
for their package.  And installers should change defaults so that install-users 
have a choice as well, eventually, to control whether they are fine with
crawling or using external links.
 
  B) Do a second PyPI migration, again with a per-package toggle and
  package owners in control, to a no external links in simple index setting.
 
  Consider for a moment how similar the end state here is with either A or
  B. In either case, by default users install only from PyPI, but by
  providing a special option they can install from some external source.
  (In B, that special option would be something like --find-links with a
  URL). In either case, we can continue to allow packages to register
  themselves on PyPI, be found in searches, etc, without uploading release
  files to PyPI if they prefer not to; they'll just have to provide
  special installation instructions to their users in that case.
 
 Not true: approach B means that you won't know what values to pass to
 the option.

Yes and no: in the one case you need to specify --crawl or 
--use-external-links and in the other --find-links https://...; 
The latter requires reading the homepage for the correct URL or 
long_description of a package so is less obvious to install-users.

 It's also confused about an important point.  All the links that
 appear in /simple are *already* completely under the package author's
 control.  No new switches are required to remove external links - you
 can simply remove them from your releases' descriptions.  This process
 could be made more transparent or easy, sure -- but it's a mistake to
 say that this is granting the package owners control that they don't
 already have.

Right.  I think allowing a package maintainer to say actually, please don't
serve external links for my package (hosting mode pypi-only) is an
easy expressive way to exert this control.

 What they lack control over is the rel= attributes, short of
 removing those links entirely.  That's why I've proposed having a
 switch for that , as reflected in Holger's 

Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-11 Thread holger krekel
Hi Philip,

thanks for your helpful review, almost all makes sense to me ...
some more inlined comments below.  Up front, i am open to you 
co-authoring the PEP if you like and share the goal to find a minimum
viable approach to speed up and simplify the interactions for installers.

On Sun, Mar 10, 2013 at 15:41 -0400, PJ Eby wrote:
 On Sun, Mar 10, 2013 at 11:07 AM, holger krekel hol...@merlinux.eu wrote:
  Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig:
  scrutiny and feedback welcome.
 
 Hi Holger.  I'm having some difficulty interpreting your proposal
 because it is leaving out some things, and in other places
 contradicting what I know of how the tools work.  It is also a bit at
 odds with itself in some places.

Certainly, it was a quick draft to get the process going and useful
feedback which worked so far :)

 For instance, at the beginning, the PEP states its proposed solution
 is to host all release files on PyPI, but then the problem section
 describes the problems that arise from crawling external pages:
 problems that can be solved without actually hosting the files on
 PyPI.

 To me, it needs a clearer explanation of why the actual hosting part
 also needs to be on PyPI, not just the links.  In the threads to date,
 people have argued about uptime, security, etc., and these points are
 not covered by the PEP or even really touched on for the most part.

Makes sense to clarify this more.

 (Actually, thinking about that makes me wonder  Donald: did your
 analysis collect any stats on *where* those externally hosted files
 were hosted?  My intuition says that the bulk of the files (by *file
 count*) will come from a handful of highly-available domains, i.e.
 sourceforge, github, that sort of thing, with actual self-hosting
 being relatively rare *for the files themselves*, vs. a much wider
 range of domains for the homepage/download URLs (especially because
 those change from one release to the next.)  If that's true, then most
 complaints about availability are being caused by crawling multiple
 not-highly-available HTML pages, *not* by the downloading of the
 actual files.  If my intuition about the distribution is wrong, OTOH,
 it would provide a stronger argument for moving the files themselves
 to PyPI as well.)
 
 Digression aside, this is one of things that needs to be clearer so
 that there's a better explanation for package authors as to why
 they're being asked to change.  And although the base argument is good
 (specifying the homepage will slow down the installation process),
 it could be amplified further with an example of some project that has
 had multiple homepages over its lifetime, listing all the URLs that
 currently must be crawled before an installer can be sure it has found
 all available versions, platforms, and formats of the that project.

Right, an example makes sense.

 Okay, on to the Solution section.  Again, your stated problem is to
 fix crawling, but the solution is all about file hosting.  Regardless
 of which of these three hosting modes is selected, it remains an
 option for the developer to host files elsewhere, and provide the
 links in their description...  unless of course you intended to rule
 that out and forgot to mention it.  (Or, I suppose, if you did *not*
 intend to rule it out and intentionally omitted mention of that so the
 rabid anti-externalists would think you were on their side and not
 create further controversy...  in which case I've now spoiled things.
 Darn.  ;-) )

To be honest, while drafting i forgot about the fact that the
long_description can contain package links as well.

 Some technical details are also either incorrect or confusing.  For
 example, you state that The original homepage/download links are
 added as links without a ``rel`` attribute if they have the ``#egg``
 format.  But if they are added without a rel attribute, it doesn't
 *matter* whether they have an #egg marker or not.  It is quite
 possible for a PyPI package to have a download_url of say,
 http://sourceforge.net/download/someproject-1.2.tgz;.

Right.  I just wanted to clarify that the distutils metadata 
download_url can contain an #egg format link and that this link
should still be served (without a rel).

 Thus, I would suggest simply stating that changing hosting mode does
 not actually remove any links from the /simple index, it just removes
 the rel= attributes from the Home page and Download links, thus
 preventing them from being crawled in search of additional file links.

That's certainly a better description of what effectively happens 
and avoids the special mentioning of #egg.

 With that out of the way, that brings me to the larger scope issue
 with the modes as presented.  Notice now that with this clarification,
 there is no real difference in *state* between the pypi-cache and
 pypi-only modes.  There is only a *functional* difference...  and
 that function is underspecified in the PEP.

Agreed.

 What I mean

Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-11 Thread holger krekel
Hi again,

A correction on one point of my last mail to you,

On Mon, Mar 11, 2013 at 10:02 +, holger krekel wrote:
  My suggestion would be to do two things:
  
  First, make the state a boolean: crawl external links, with the
  current state yes and the future state no, with no simply meaning
  that the rel= attribute is removed from the links that currently
  have it.
  
  Second, propose to offer tools in the PyPI interface (and command
  line) to assist authors in making the transition, rather than
  proposing a completely unspecified caching mechanism.  Better to have
  some vaguely specified tools than a completely unspecified caching
  mechanism, and better still to spell out very precisely what those
  tools do.
 
 This structure makes sense to me except that i see the need to start off with
 pypi-ext, i.e. a third state which encodes the current behaviour.

Wait, your suggestion of a boolean crawl external set to yes
would encode the current behaviour, so my except is invalid.

 Thing is that the pypi.python.org doesn't have an extensive test 
 suite and we will thus need to rely on a few early adopters 
 using the tools/state-changes before starting phase 2 (mass mailings etc.).
 Also in case of problems we can always switch back packages to the safe
 pypi-ext mode.  IOW, the motiviation for this third state is considering
 the actual implementation process.

This can also be done with your two-state suggestion (switching back 
to crawl=yes).  So no disagreement on that either.

best,
holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


[Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-10 Thread holger krekel
Hi Donald, Richard, Nick, Philip, Marc-Andre, all,

after some more thinking i wrote a simplified PEP draft for
transitioning hosting of release files to pypi.python.org.  A PEP is
warranted IMO because the according changes will affect all python
package maintainers and the Python packaging ecology in general.  See
the current draft (pre-submit-v1) further below in this mail. 
I also created a bitbucket repository, see PEP-PYPI-DRAFT.txt  at 

https://bitbucket.org/hpk42/pep-pypi/src

Donald, i'd be happy if you join as a co-author and contribute
your statistics script and possibly more implementation stuff (PRs 
to pypi software etc.).  

Philip, Marc-Andre, Richard (Jones), Nick and catalog-sig/distutils-sig:
scrutiny and feedback welcome.

Nick: if you could collect feedback on the PEP (draft) around the 
packaging and distribution mini-summit at Pycon US (15th March), that'd 
be very useful.  

Richard: I may ask you to become BDFL-delegate for this PEP especially
since you will need to integrate any resulting changes :)

I'd like to formally submit this PEP soon but not before i got some 
feedback.

I am not subscribed to distutils-sig and i think distutils is not much
affected, but it probably still would help if someone cross-posts there
(please put me in CC).

cheers,
holger


PEP-draft: transition to release file hosting at pypi.python.org
=

Status
---

PRE-SUBMIT-v1

Abstract


This PEP proposes to move hosting of all release files to
pypi.python.org itself.  To ease transition and minimize client-side
friction, **no changes to distutils or installers** are required.
Rather, the transition is implemented through changes to the pypi.python.org 
implementation and by interactions with package maintainers.

Problem
---

Today, python package installers (pip and easy_install) need to
query multiple sites to discover release files.  Apart from querying
pypi.python.org's simple index pages, also all homepages and
download pages ever specified with any release of a package need to
be crawled by an installer.  The need for installers to crawl 3rd party
sites slows down installation and makes for a brittle unreliable 
installation process. 

As of March 2013, about 10% of packages have release files which
are not hosted directly from pypi.python.org but rather from places
referenced by download/homepage sites.  

Conversely, roughly 90% of packages are hosted directly on
pypi.python.org [1]_.  Even for them installers still need to crawl the
homepage(s) of a package.  Many package uploaders are particularly not
aware that specifying the homepage will slow down the installation
process.


Solution
---

Each package is going to get a hosting mode field which effects
all historic and future releases of a package and its release files.
The field has these values and meanings:

- pypi-ext (transitional) encodes exactly the current mode of operations:
  homepage/download urls are presented in simple/ pages and client-side
  tools need to crawl them themselves to find release file links. 

- pypi-cache: Release files located on remote sites will be downloaded 
  and cached by pypi.python.org by crawling homepage/download metadata sites.
  The resulting simple index contains links to release files hosted by
  pypi.python.org.  The original homepage/download links are added as
  links without a ``rel`` attribute if they have the ``#egg`` format.

- pypi-only: homepage/download links are served on simple indexes
  but without a ``rel`` attribute.  Installation tools will thus not
  crawl those pages anymore.  Use this option if you commit to always
  uploading your release files to pypi.python.org.


Phases of transition
-

1. At the outset, we set hosting-mode to pypi-ext for all packages.
   This will not change any link served via the simple index and thus
   no bad effects are expected.  Early adopters and testers may now
   change the mode to either pypi-only or pypy-cache to help with
   streamlining issues.  After implementation and UI issues are
   streamlined, the next phase can start.

2. We perform automatic analysis for each package to determine if it is
   a package with externally hosted release files.  Packages which only 
   have release files on pypi.python.org are put in the group A,
   those which have at least some packages outside are put in the group B.

   We sent then a mail to all maintainers of packages in A 
   that their hosting-mode is going to be switched automatically to 
   pypi-only after N weeks, unless they visit their package
   administration page earlier and set it to either pypi-cache or
   pypi-only earlier.

   We sent then a mail to all maintainers of packages in B
   that their hosting-mode is going to be switched automatically to 
   pypi-cache after N weeks, unless they visit their package
   administration 

Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-10 Thread holger krekel
On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote:
 On Mar 10, 2013, at 11:07 AM, holger krekel hol...@merlinux.eu wrote:
  [...]
  Transitioning to pypi-cache mode
  -
  
  When transitioning from the currently implicit pypi-ext mode to
  pypi-cache for a given package, a package maintainer should 
  be able to retrieve/verify the historic release files which will 
  be cached from pypi.python.org.  The UI should present this list
  and have the maintainer accept it for completing the transition
  to the pypi-cache mode.  Upon future release registration actions,
  pypi.python.org will perform crawling for the homepage/download sites
  and cache release files *before* returning a success return code for
  the release registration.
   [...]
 
 Some concerns:
 
 1. We cannot automatically switch people to pypi-cache. We _have_ to get 
 explicit permission from them.

Could you detail how you arrive at this conclusion?
(I've seen the claim before but not the underlying reasoning, maybe
i just missed it)

There would be prior notifications to the package maintainers.  If they 
don't want to have their packages cached at pypi.python.org, they can set
the mode to pypi-only and leave manual instructions.  I suspect there will
be very few people if anyone, objecting to pypi-cache mode.  If that is
false we might need to prolong pypi-ext mode some more for them and 
eventually switch them to pypi-only when we eventually decide to get
rid of external hosting.

 2. The cache mechanism is going to be fragile, and in the long term leaves a 
 window open for security issues.

fragility: not sure it's too bad.  Once the mode is activited release
registration (submit POST action on /pypi http endpoint) will only
succeed if according releases can be found through homepage/download.
Changing the mode to pypi-cache in the presence of historic release
files hosted elsewhere needs a good pypi.python.org UI interaction and
may take several tries if neccessary sites cannot be reached.  Nevertheless,
this step is potentially fragile [X].

Security: the PEP does not try to prevent package tampering. MITM attacks
between pypi.python.org and the download sites may occur as much as they
can happen today between installers and the download sites.  
I think we should consider protection against package tampering 
in a separate discussion/PEP.

 If we're going to do a phased in per project solution like this I think it 
 would work much better to have 2 modes.
 
 1. Legacy - Current behavior, new external links are accepted, existing ones 
 are displayed

 2. PyPI Only - New behavior, no new external links are accepted, existing 
 ones are removed
 
 Present the project owners with 2 one way buttons:
- Switch to PyPI Only and re-host external files [1]

Doesn't this have the same fragility problem as [X] above?

- Switch to PyPI Only and do NOT re-host external files

Are there any problems for doing this automatically (with a prior 
notification to maintainers) for all the projects where we don't 
find externally hosted packages?  I'd expect very few false negatives
and they can be quickly switched back.

Back to pypi-cache: it is there to make it super-easy for package
maintainers.  There are all kinds of release habits and scripts pushing out
things to google/bitbucket/github/other sites.  With pypi-cache they
don't need to change any of that.  They just need to be fine with
pypi.python.org pulling in the packages for caching.

We might think about phasing out pypi-cache after some larger time
frame so that we eventually only have pypi-only and things are eventually
simple and saner.

best,
holger



 These buttons would be one time and quit. Once your project has been switched 
 to PyPI Only you cannot go back to Legacy mode. All new projects would be 
 already switched to PyPI Only. After some amount of time switch all Projects 
 to PyPI Only but _do not_ re-host their packages as we cannot legally do so 
 without their permission.
 
 The above is simpler, still provides people an easy migration path, moves us 
 to remove external hosting, and doesn't entangle us with legal issues.
 
 [1] There is still a small window here where someone could MITM PyPI fetching 
 these files, however since it would be a one time and down deal this risk is 
 minimal and is worth it to move to an pypi only solution.
 
 -
 Donald Stufft
 PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
 


___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

2013-03-10 Thread holger krekel
On Sun, Mar 10, 2013 at 14:29 -0400, Donald Stufft wrote:
 
 On Mar 10, 2013, at 2:18 PM, holger krekel hol...@merlinux.eu wrote:
 
  On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote:
  On Mar 10, 2013, at 11:07 AM, holger krekel hol...@merlinux.eu wrote:
  [...]
  Transitioning to pypi-cache mode
  -
  
  When transitioning from the currently implicit pypi-ext mode to
  pypi-cache for a given package, a package maintainer should 
  be able to retrieve/verify the historic release files which will 
  be cached from pypi.python.org.  The UI should present this list
  and have the maintainer accept it for completing the transition
  to the pypi-cache mode.  Upon future release registration actions,
  pypi.python.org will perform crawling for the homepage/download sites
  and cache release files *before* returning a success return code for
  the release registration.
  [...]
  
  Some concerns:
  
  1. We cannot automatically switch people to pypi-cache. We _have_ to get 
  explicit permission from them.
  
  Could you detail how you arrive at this conclusion?
  (I've seen the claim before but not the underlying reasoning, maybe
  i just missed it)
  
  There would be prior notifications to the package maintainers.  If they 
  don't want to have their packages cached at pypi.python.org, they can set
  the mode to pypi-only and leave manual instructions.  I suspect there will
  be very few people if anyone, objecting to pypi-cache mode.  If that is
  false we might need to prolong pypi-ext mode some more for them and 
  eventually switch them to pypi-only when we eventually decide to get
  rid of external hosting.
 
 I asked VanL. His statement on re-hosting packages was:
 
 We could do it if we had permission. The tricky part would be getting 
 permission for already-existing packages.
 
 I'm pretty sure that emailing someone and assuming we have permission if they 
 don't opt-out doesn't count as permission.

Hum, i I saw Jesse Noller saying a few days ago let them opt out.
But i guess VanL can trump that :)  If that is true we could change the
notification to maintainers of B packages that hosting mode is going to
change to pypi-only, which would loose their release files unless they
opt-in to pypi-cache.  As long as that is a no-brainer for them, we are
not asking for much and can count on most people's good will to not make
other people's installation life harder.

Besides, admins could still set the pypi-ext mode if a maintainer can
explain why it's a problem for them to agree to pypi-cache or
pypi-only.  I'd really like to not have too many packages lingering
around in pypi-ext mode if it can be avoided.

  
  2. The cache mechanism is going to be fragile, and in the long term leaves 
  a window open for security issues.
  
  fragility: not sure it's too bad.  Once the mode is activited release
  registration (submit POST action on /pypi http endpoint) will only
  succeed if according releases can be found through homepage/download.
  Changing the mode to pypi-cache in the presence of historic release
  files hosted elsewhere needs a good pypi.python.org UI interaction and
  may take several tries if neccessary sites cannot be reached.  Nevertheless,
  this step is potentially fragile [X].
 
 I see, so pypi-cache would only be triggered once during release creation. 
 Cache makes it sound like we'd continuously monitor the given external urls 
 instead of it actually being a pull based method of getting files.

Right, we need to avoid cache invalidation problems by only allowing
updates at user-chosen point in times (there might also be an explicit 
update cache button in case a maintainer pushes a egg/wheel later).  
It's still technically a cache i think but the term rehost would 
work as well i guess.

 [...]
  Back to pypi-cache: it is there to make it super-easy for package
  maintainers.  There are all kinds of release habits and scripts
  pushing out things to google/bitbucket/github/other sites.  With
  pypi-cache they don't need to change any of that.  They just need
  to be fine with pypi.python.org pulling in the packages for caching.
 
 Yes I understand the goal here. The problem is that there's not really
 a good way to secure this without requiring changes to their workflow. 
 At best they'll have to push information about every file so that PyPI
 is able to verify the files it is downloading, and if we are requiring
 them to push data about those files we might as well require them to
 push the files themselves. 

Is this about protection against package tampering?  If so, I think a
proper solution involves maintainers signing their release files but
this is outside the intended scope of the PEP.

Otherwise, the re-hosting process for pypi-cache mode is at least as
secure as currently where all hosts issuing pip/easy_install commands
visit external sites and can thus be MITM-attacked.  For pypi-only
server packages it's safer because

Re: [Catalog-sig] hash tags

2013-03-08 Thread holger krekel
Hi Philip, all,

On Fri, Mar 08, 2013 at 14:16 -0500, PJ Eby wrote:
 The key to making this transition isn't creating elaborate new
 standards for the tools, it's *creating new tools for the standards*.

If we can find a way to improve PyPI and not require the world to
change first, that's a big plus in my book as well.

 Point is, this entire thing can be done correctly at the PyPI end and
 work with the existing API of the download tools.

I think so as well.  Will suggest a transition model in a 
new top-level thread, trying to follow this idea.

best,
holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Fw: Deprecate External Links

2013-03-05 Thread holger krekel
On Tue, Mar 05, 2013 at 04:19 -0500, Donald Stufft wrote:
 Forwarding this since I assume it was accidently sent to only me, 
 and it's important to note that there is some sort of miscounting bug
 going on.
 
 
 Forwarded message:
 
  From: Donald Stufft donald.stu...@gmail.com
  To: M.-A. Lemburg m...@egenix.com
  Date: Tuesday, March 5, 2013 4:16:53 AM
  Subject: Re: [Catalog-sig] Deprecate External Links
  
  On Tuesday, March 5, 2013 at 4:12 AM, M.-A. Lemburg wrote:
   Perhaps I'm misunderstanding, but if the list contains packages that:
   
   * are installable via pip
   
   * are not hosted on PyPI
   
   then why isn't e.g. egenix-mx-base included in that list ?
  Unsure, must be a bug in the script. I saw some BadStatusLine errors
  during the processing but I just assumed they were issues with the server
  pip was trying to fetch from. I'll see if I can't sort out the reasoning 
  that
  egenix-mx-base doesn't show up.

FYI lockfile is also not in your list, and it only had lockfile-0.2 at
Pypi, the rest up to 0.9.1 is all at code.google (latest is
lockfile-0.9.1.tar.gz).

best,
holger

 
 

 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig

___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecate External Links

2013-03-01 Thread holger krekel
On Fri, Mar 01, 2013 at 10:02 +0100, Reinout van Rees wrote:
 On 28-02-13 21:08, holger krekel wrote:
 I have seen that position in this discussion (I have to upload 120
 files per release, so I won't do that, for instance).
 
 haven't seen that.
 
 Marc-Andre Lemburg said this, which I took to mean 120 uploads per release:
 
 
 However, taking our egenix-mx-base package as example, we have
 120 distribution files for every single release. Uploading those
 to PyPI would not only take long, but also ...
 

Ah ok, thanks.  Didn't interpret Marc-Andre's post as claiming that 
downloads/homepage crawling is a good idea, though.  Just that there
has been reasons not to upload things which need to be addressed,
especially the need for enough storage space.

best,
holger

 
 
 Reinout
 
 -- 
 Reinout van Reeshttp://reinout.vanrees.org/
 rein...@vanrees.org http://www.nelen-schuurmans.nl/
 If you're not sure what to do, make something. -- Paul Graham
 
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecate External Links

2013-03-01 Thread holger krekel
On Fri, Mar 01, 2013 at 10:24 +0100, M.-A. Lemburg wrote:
 On 01.03.2013 10:02, Reinout van Rees wrote:
  On 28-02-13 21:08, holger krekel wrote:
  I have seen that position in this discussion (I have to upload 120
  files per release, so I won't do that, for instance).
  
  haven't seen that.
  
  Marc-Andre Lemburg said this, which I took to mean 120 uploads per release:
  
  
  However, taking our egenix-mx-base package as example, we have
  120 distribution files for every single release. Uploading those
  to PyPI would not only take long, but also ...
  
 
 Correct, with a total of over 100MB per release. However, the above
 quote is slightly incorrect: I did not say I won't do that, just
 that there are issues with doing this:
 
 * It currently takes too long uploading that many files to
   PyPI. This causes a problem, since in order to start the upload,
   we have to register the release on PyPI, which tools will then
   immediately find. However, during the upload time, they won't
   necessarily find the right files to download and then fail.

You can actually skip the register and directly upload, it will
create release metadata on the fly.  Not sure if it's complete
but you can then do a register to update it if needed.

best,
holger

   The proposed pull mechanism (see
   http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
   would work around this problem: tools would simply go to
   our servers in case they can't find the files on PyPI.
 
 * PyPI doesn't allow us to upload two egg files with the same
   name: we have to provide egg files for UCS2 Python builds and
   UCS4 Python builds, since easy_install/setuptools/pip don't
   differentiate between the two variants. This is the main
   reason why we're hosting our own PyPI-style indexes, one for
   UCS2 and the other for UCS4 builds:
   https://downloads.egenix.com/python/index/ucs2/
   https://downloads.egenix.com/python/index/ucs4/
 
 * I'm not sure whether we want to import our crypto packages
   to the US, so for a subset of the files, we'd probably
   continue to use our servers in Germany.
 
   Again, with the above proposal, this shouldn't be a problem.
 
 * Ihe PyPI terms are a bummer for us, but this can be fixed,
   I guess.
 
 If we can resolve the issues, we'd have no problem having the
 files mirrored on PyPI.
 
 -- 
 Marc-Andre Lemburg
 eGenix.com
 
 Professional Python Services directly from the Source  (#1, Mar 01 2013)
  Python Projects, Consulting and Support ...   http://www.egenix.com/
  mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
  mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/
 
 
 : Try our mxODBC.Connect Python Database Interface for free ! ::
 
eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


[Catalog-sig] homepage/download metadata cleaning

2013-03-01 Thread holger krekel
Hi Richard, all,

somewhere deep in the threads i mentioned i wrote a little cleanpypi.py
script which takes a project name as an argument and then goes to 
pypi.python.org and removes all homepage/download metadata entries for 
this project.  This sanitizes/speeds up installation because
pip/easy_install don't need to crawl them anymore.  I just did this for
three of my projects, (pytest, tox and py) and it seems to work fine.

Now before i release this as a tool, i wonder: Is it a good idea to remove
download/homepage entries?  Is there any current machine use (other than
the dreaded crawling) for the homepage/download_url per-release metadata 
fields?

For humans the homepage link is nicely discoverable if the long-description
doesn't mention it prominently.  But i think there also is a project url 
or bugtrack url for a project so maybe those could be used to reference 
these important pages?  (i am a bit confused on the exact meaning of those
urls, btw).

Should we maybe stop advertising homepage and download_url
and instead see to extend project-url/bugtrackurl to be used
and shown nicely? The latter are independent of releases which i think
makes sense - what use are old probably unreachable/borked homepages
anyway.  And it's also not too bad having to go once to pypi.python.org
to set it, usually it seldomly changes.

best,
holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] homepage/download metadata cleaning

2013-03-01 Thread holger krekel
On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote:
 On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote:
  On 01.03.2013 11:19, holger krekel wrote:
   Hi Richard, all,
   
   somewhere deep in the threads i mentioned i wrote a little cleanpypi.py
   script which takes a project name as an argument and then goes to 
   pypi.python.org (http://pypi.python.org) and removes all 
   homepage/download metadata entries for 
   this project. This sanitizes/speeds up installation because
   pip/easy_install don't need to crawl them anymore. I just did this for
   three of my projects, (pytest, tox and py) and it seems to work fine.
   
  
  
  Does it also cleanup the links that PyPI adds to the /simple/ by
  parsing the project description for links ?
  
  I think those are far nastier than the homepage and download links,
  which can be put to some good use to limit the external lookups
  (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
  
  See e.g. https://pypi.python.org/simple/zc.buildout/
  for a good example of the mess this generates... even mailto links
  get listed and file:/// links open up the installers for all
  kinds of nasty things (unless they explicitly protect against
  following these).
  
  
 
 pip at least, and I assume the other tools don't spider those links, but
 they do consider them for download (e.g. if the link looks installable
 it will be a candidate for installing, but  it won't fetch it, and look for 
 more links like it will donwnload_url/home_page).
 
 I believe that's the way it's structured atm.

That's right. Even though the long-description extracted links 
look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything
with them except if the href ends in #egg=PKGNAME- in which case they are
taken as pointing to a development tarball (e.g. at github or bitbucket).
ASFAIK a link like PKGNAME-VER.tar.gz will not be treated as
an installation candidate, just the #egg=PKGNAME one.

best,
holger


  
   Now before i release this as a tool, i wonder: Is it a good idea to remove
   download/homepage entries? Is there any current machine use (other than
   the dreaded crawling) for the homepage/download_url per-release metadata 
   fields?
   
   For humans the homepage link is nicely discoverable if the 
   long-description
   doesn't mention it prominently. But i think there also is a project url 
   or bugtrack url for a project so maybe those could be used to reference 
   these important pages? (i am a bit confused on the exact meaning of those
   urls, btw).
   
   Should we maybe stop advertising homepage and download_url
   and instead see to extend project-url/bugtrackurl to be used
   and shown nicely? The latter are independent of releases which i think
   makes sense - what use are old probably unreachable/borked homepages
   anyway. And it's also not too bad having to go once to pypi.python.org 
   (http://pypi.python.org)
   to set it, usually it seldomly changes.
   
  
  
  I think it would be better to differentiate between showing the
  fields on the project pages, where they provide useful resources
  for people, and their use on the /simple/ index pages which are
  meant for programs to parse.
  
  IMO, the homepage and download links on the project pages are
  indeed very useful for people. On the /simple/ index a homepage
  link is probably not all that useful (provided a download link
  is set). The download links serve the purpose of directing
  tools to the right location, so those do belong on the /simple/
  index listings. I'd completely remove the links parsed from
  the descriptions, since those don't really provide a good
  basis for crawling (the description is meant for humans to
  parse, not programs).
  
  -- 
  Marc-Andre Lemburg
  eGenix.com (http://eGenix.com)
  
  Professional Python Services directly from the Source (#1, Mar 01 2013)
 Python Projects, Consulting and Support ... http://www.egenix.com/
 mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
 

   
  
  
  
  : Try our mxODBC.Connect Python Database Interface for free ! ::
  
  eGenix.com (http://eGenix.com) Software, Skills and Services GmbH 
  Pastor-Loeh-Str.48
  D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
  Registered at Amtsgericht Duesseldorf: HRB 46611
  http://www.egenix.com/company/contact/
  ___
  Catalog-SIG mailing list
  Catalog-SIG@python.org (mailto:Catalog-SIG@python.org)
  http://mail.python.org/mailman/listinfo/catalog-sig
  
  
 
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] PyPI terms

2013-03-01 Thread holger krekel
On Fri, Mar 01, 2013 at 15:11 +0100, M.-A. Lemburg wrote:
 On 01.03.2013 15:02, Jesse Noller wrote:
  Okie doke. So we can move on to putting up the CDN and deprecating external
  links for now?
 
 I don't think anyone is against putting up a CDN. It should meet
 the same security requirements we have for the pypi server itself,
 ie. HTTPS all the way, proper certificates, operated by the PSF,
 perhaps run on a different domain, and whatever other goodies
 Donald can come up with ;-)
 
 For the external links we need a migration path... that's in the works.
 
 See http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal for
 a proposal that allows migrating away from relying on external
 hosts in a backwards compatible and secure way.

The page doesn't describe the current scraping situation accurately.
As mentioned in my last post, pip/easy_install do _not_ visit
all links found in simple/PKGNAME. Only the ones with rel=home_page or
rel=download.  So the proposal effectively says to not visit
homepage links by default and use a special format for download ones.
The special format i am not sure about - i guess the SHA256 hash there
is to make sure the target content is the correct one, right?
What about abusing download_url some more and do a multiline-format like 
this:

HASH1 URL-TO-RELEASE-FILE1
HASH2 URL-TO-RELEASE-FILE2

This way we can avoid any additional http-requests on the pip/easy_install
client side _and_ allow multiple release files.  The simple/PKGNAME metadata 
would contain all information that is needed (and we could probably introduce
a special syntax for #egg github/bitbucket-style tarballs). Those URLs would 
only be retrieved if the client-side installer determines it needs them because
of the user-required version.  You wouldn't need to create a special
-download.html file then, no additional http requests, and it's easy to 
create this format without much tool support.

Can't incorporate this into the wiki right now myself and i'd probably 
like to structure the page differently.  The issue here really is the
(future) behaviour of easy_install and pip, not so much distutils or the
pypi server (apart from the worthwhile-to-consider idea of
pulling/caching things).

On a side note i'd rather prefer this to be a github/bitbucket project
where i can submit a pull request :)

best,
holger


 -- 
 Marc-Andre Lemburg
 eGenix.com
 
 Professional Python Services directly from the Source  (#1, Mar 01 2013)
  Python Projects, Consulting and Support ...   http://www.egenix.com/
  mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
  mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/
 
 
 : Try our mxODBC.Connect Python Database Interface for free ! ::
 
eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] homepage/download metadata cleaning

2013-03-01 Thread holger krekel
On Fri, Mar 01, 2013 at 23:50 +0100, Lennart Regebro wrote:
 On Fri, Mar 1, 2013 at 8:31 PM, M.-A. Lemburg m...@egenix.com wrote:
  Hmm, then why not remove links that don't match the above from
  the /simple/ index pages ?
 
 I think we can do that, but if we *start* with that, we will just
 suddenly, with no warning, break everything.
 Its' better if the installation tools can first warn, then remove
 their support for this, and *then* we remove these links from
 /simple/.

I think Marc-Andre was just refering to the superflous links
from the long-description, namely all links which don't match
the #egg format and don't have a rel of download/homepage.

Phillip clarified that pypi served all long-description links at the
time to leave it to the tools to interpret them.  The interpretation is
now pretty clear and so pypi doesn't need to provide them.  It shouldn't
break neither pip nor easy_install to remove those unused long-description
links.

 That way we break things gradually, with warnings so that package
 managers can react and adapt.

I generally agree to this strategy but would add that we should
also consider the life of system admins or other package installers
who may not be able to get maintainers to make new releases.
For me this mainly means to aim for changing defaults in pip and
easy_install but not to remove crawling abilities completely for
the time being.

best,
holger

___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecate External Links

2013-02-28 Thread holger krekel
On Wed, Feb 27, 2013 at 22:04 +0100, Lennart Regebro wrote:
 On Wed, Feb 27, 2013 at 8:49 PM, Monty Taylor mord...@inaugust.com wrote:
  But wouldn't this only be a change in pip/easy_install, not PyPI
  itself? I suppose you could explicitly break the external links by
  having them point to nothing if you are worried about the security or
  if it's some performance issue (that would indeed be a bad
  compatibility break, in case people are using those for other
  purposes).  Otherwise, if it's a problem, then just use the old
  version of pip.
 
  If we don't remove the feature from pypi itself
 
 It isn't a feature of PyPI. PyPI doesn't require you to upload the
 files to PyPI. For that reason, easy_install and PIP will scrape
 external sites to be able to download the files.
 
 What we should do is agree that this should stop, and a deprecation
 warning to pip and easy_install and after some pre-determined time
 remove the feature from easy_install and pip.

I suggest to *change defaults* rather than to remove the feature for
the foreseeable future.  Changing defaults is a powerful way to communicate
and one that doesn't leave people totally stranded who are far removed from
discussions and rationales here.

  folks for whom its a problem, because there will be no incentive for the
  folks hosting their software that way to actually upload their stuff to
  PyPI
 
 Yes there will be: Everyone mailing them to tell them there software
 is broken and can't be installed with easy_install and pip. That's
 going to be very annoying very fast. ;-)

I've mailed several maintainers in the last half year of 1K downloaded
projects to inquire about status, and not received replies.  I wanted
to base work on their projects and of course i refrained from doing that
because of the lack of replies.  To me that means you can have users
mailing maintainers or screaming at maintainers or saying bad words
about maintainers or projects all you want but that doesn't mean it's
going to be fixed.   

To summarize, having pip/easy_install report red warnings and requiring
to pass a --htmlscrape=PROJ1,PROJ2 option or so is a good way to 
communicate, removing the ability is not, at this point.

best,
holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecate External Links

2013-02-28 Thread holger krekel
On Thu, Feb 28, 2013 at 09:48 +1100, Richard Jones wrote:
 On 28 February 2013 08:31, PJ Eby p...@telecommunity.com wrote:
  OTOH, I currently make development snapshots of setuptools and other
  projects available by dumping them in a directory that's used as an
  external download URL.  Replacing that would be a PITA because PyPI
  only lets you upload and register new releases from distutils' command
  line.  Basically, I'd need to use a download link that pointed to a
  latest URL that redirected to the final download.
 
 Yup, and the down-side of distutils as the tool for talking to PyPI
 is, of course, the horrendous turn-around time trying to add features
 or fix bugs.
 
 I've advocated us having the upload/register/whatever functionality in
 a separate tool for a while, but that doesn't seem to have gained any
 traction. Of course issues around the complexity introduced by
 setup.py make it much harder.

FWIW three days ago i presented at Pycon Russia a unifying cmdline 
workflow meta tool which configures and invokes setup.py
[...]/pip/easy_install commands.  I intend to publish it soon and 
will also send a link once the video becomes available.

IOW, i fully agree we need to move away from putting things into 
setup.py/distutils, start going for PEP426 etc. -- but WITHOUT breaking 
things for all the packaging upload/installation processes out there.
Therefore a meta tool approach to make it easier for people to
gradually move away from current practises.

cheers,
holger

 In the mean time I think Donald's suggestion for supporting
 development pre-releases is reasonable:
  instead of (please replace with easy_install lingo here)
  `pip install setuptools==setuptools-dev` please `pip install -e
  http://svn.python.org/projects/sandbox/trunk/setuptools/#egg=setuptools-dev`
   ?
 
 
 
 Richard
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecate External Links

2013-02-28 Thread holger krekel
On Thu, Feb 28, 2013 at 06:38 +0100, Andreas Jung wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 +1 for the proposal
 
 The complete discussion on this topic is once again absurd and bizarre.
 We are discussing the issue with externally hosted packages every year
 and the situation has not improved. Especially people using buildout
 encounter very regulary issues with external site being down - with the
 result that we can not install or update our installation.
 
 I give a shit at the arguments pulled out every time by package
 maintainers using PyPI only for listing their packages. I am both
 annoyed and bothered by these people.

I didn't see such positions from package maintainers here.  In fact
i haven't seen anyone stepping up saying listing packages externally
is a great idea.  Could you point to those posts?

However, I have seen concerns about breaking many people's and
companies processes and thus thoughts on how to do a good transition.
I guess you don't want to communicate to package-users the way 
you do above to package maintainers.

best,
holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecate External Links

2013-02-28 Thread holger krekel
On Thu, Feb 28, 2013 at 16:30 +0100, Lennart Regebro wrote:
 On Thu, Feb 28, 2013 at 10:43 AM, Lennart Regebro rege...@gmail.com wrote:
  On Thu, Feb 28, 2013 at 9:28 AM, Nick Coghlan ncogh...@gmail.com wrote:
  Pissing off the maintainers off packages that currently rely on
  external hosting by telling them they have to change their release
  processes if they want to keep releasing software on PyPI and have
  their users actually be able to download it is *not* a good idea,
  especially when we're about to ask them to upgrade their build chains
  for other reasons (including both security and reliability).
 
  Who are these people by the way?
 
 I can answer that question now. I have a list of 2651 emails of people
 listed as maintainers or authors of software that doesn't have
 releases on PyPI.
 This is a very inclusive list, so it's lists *all* maintainers and
 authors of *all* versions of a package, if that package has no files
 on PyPI.
 And there are duplicate people, of course, although the emails are unique.

There are also packages which have some (older) release files on pypi
and newer ones outside (e.g. lockfile with 78256 downloads from 
code.google.com).  You didn't include such in your 2651 emails, or did you?

holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecate External Links

2013-02-28 Thread holger krekel
On Thu, Feb 28, 2013 at 13:56 +0100, Reinout van Rees wrote:
 On 28-02-13 10:43, holger krekel wrote:
 On Thu, Feb 28, 2013 at 06:38 +0100, Andreas Jung wrote:
 
 I give a shit at the arguments pulled out every time by package
 maintainers using PyPI only for listing their packages. I am both
 annoyed and bothered by these people.
 
 I didn't see such positions from package maintainers here.  In fact
 i haven't seen anyone stepping up saying listing packages externally
 is a great idea.  Could you point to those posts?
 
 The position Andreas probably means is projects that *do* advertise
 themselves on pypi, but don't put their files there.

It has been an accepted practise for 10 years.

 I have seen that position in this discussion (I have to upload 120
 files per release, so I won't do that, for instance).

haven't seen that.

 Some arguments might be valid, but these projects *are*, taken as
 one group, actively breaking pip and buildout regularly.

yes, and it's annoying, fully agreed.

 So I agree with Andreas. I don't really care about the arguments
 pulled out every time. Effectively actively breaking pip and
 buildout is bad, period.

I consider it a valid concern that taking homepage/download urls away
from pypi's server index is likely to break things for users.  I don't
see the point of doing that if we can have a better migration path by
working on the installers (like is currently ongoing).  Let's please
not do a blackwhite discussion here and try to improve the overall
situation, not just a particular aspect in a particular way.

holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Deprecate External Links

2013-02-27 Thread holger krekel
On Wed, Feb 27, 2013 at 14:49 -0500, Monty Taylor wrote:
 On 02/27/2013 02:47 PM, Aaron Meurer wrote:
  On Wed, Feb 27, 2013 at 11:37 AM, holger krekel hol...@merlinux.eu wrote:
  On Wed, Feb 27, 2013 at 19:34 +0100, Lennart Regebro wrote:
  On Wed, Feb 27, 2013 at 5:34 PM, M.-A. Lemburg m...@egenix.com wrote:
  I'm not saying that it's not a good idea to host packages on PyPI,
  but forcing the community into doing this is not a good idea.
 
  I still don't understand why not. The only reasons I've seen are
  Because they don't want to or because they don't trust PyPI. And
  in the latter case I'm assuming they wouldn't use PyPI at all.
 
  And of course, nobody is forcing anyone, just like nobody is forcing
  you to use PyPI. :-)
 
  I understood there is the idea to disable external links within a couple
  of months.  That does break backward compatibility in a considerable way.
 
  holger
  
  But wouldn't this only be a change in pip/easy_install, not PyPI
  itself? I suppose you could explicitly break the external links by
  having them point to nothing if you are worried about the security or
  if it's some performance issue (that would indeed be a bad
  compatibility break, in case people are using those for other
  purposes).  Otherwise, if it's a problem, then just use the old
  version of pip.
 
 If we don't remove the feature from pypi itself, then it won't help the
 folks for whom its a problem, because there will be no incentive for the
 folks hosting their software that way to actually upload their stuff to
 PyPI - which means that client-side disabling of external_links is
 fairly likely to never be usable.

I can see it's tempting to just try to force everyone to upload
their stuff to pypi.python.org.  I am very skeptical about this approach.

There already is a high frustration with the packaging ecology
in Python.  When we remove external links on the server side, installs
for many people and companies are going to break, no matter what.  And
they would have no client-side switch anymore to make things working.
Requiring to use older setuptools/pip versions would not help because
the server information is gone.  That'd just increase frustration.

So at the very least using external links needs to be a client-side
installer choice for a long while and the server needs to offer
the according information.

I'd generally prefer to think hard about ways to improve the situation
without breaking things.  Putting simple/ and packaging serving on a CDN
is one such step and a good idea i think.  Establishing a
signing/verification mechanism is another.  Refining py2/py3 dependency
discovery yet another good one.

best,
holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] HTTPS now promoted on PyPI

2013-02-19 Thread holger krekel
On Tue, Feb 19, 2013 at 14:23 +0100, Giovanni Bajo wrote:
 Il giorno 19/feb/2013, alle ore 06:13, Richard Jones r1chardj0...@gmail.com 
 ha scritto:
 
  Hi all,
  
  I've just altered the nginx configuration to promote (ie. redirect to)
  HTTPS for all GET/HEAD requests. This includes HSTS, but I've set the
  lifetime to 1 day just in case there's some HTTPS compatibility
  issues. Once it's bedded down I'll bump it to a year.
 
 What is the benefits of redirects? I think they just hide potential problems, 
 and they still can be exploited by MITM through ssl-stripping. Plus, they 
 cause breakage and/or UX problems in existing tools. 
 
 Given that they give basically no security, I would suggest their removal 
 until we fix all important issues in all third-party tools. For browsers, 
 since you can still serve HSTS headers even without redirects, we can get it 
 included in Chrome and Firefox builtin HSTS list.
 
  2. incorporate some monkey-patching into distribute and setuptools and
  promote those,
 
 I think this is our best bet for an immediate and global solution for 
 outdated versions of Python as well. I will work to prepare a distutils patch 
 that is compatible with 2.6 (which includes SSL), and then adapt it for 2.7 
 and 3.x. 
 
 Do we have numbers of how many 2.5-compatible packages have been updated in 
 the last 6 months?

FYI i did a number of py25 compatible releases of projects in the last 6
months - but i generally upload the dist files from higher python
versions, so no patch for 2.5 needed (or 2.6 for that matter).

best,
holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] RubyGems Threat Model and Requirements

2013-02-12 Thread holger krekel
On Tue, Feb 12, 2013 at 12:44 -0500, Daniel Holth wrote:
 On Tue, Feb 12, 2013 at 11:27 AM, Giovanni Bajo ra...@develer.com wrote:
  
   Your Task #6/#7 (related to PyPI generating the trust file, and pip
   verifying it) are the ones where I think the input of the TUF team
   will be most valuable, as well as potentially the folks responding to
   the rubygems.org attack.
 
  My undestanding is that #6/#7 are not currently covered by TUF. So yes, I
  would surely value their input to review my design, evolve it together or
  scratch it and come up with something new.
 
  Sorry for the repetition, but I also volunteer for implementation. I don't
  mind if someone else does it (or a subset of it, or we split, etc.), but I
  think it is important to say that this is not a theoretical proposal that
  someone else will have to tackle, but I'm happy to submit patches (all of
  them, in the worst case) to the respective maintainers and rework them
  until they are acceptable.
 
   The rubygems.org will also be looking at server side incident response
   - I suspect a lot of that side of things will end up running through
   the PSF infrastructure team moreso than catalog-sig (although it may
   end up here if it involves PyPI code changes.
 
 
  While I do have some ideas, I don't think I'm fully qualified for that
  side of things. Primarily, my proposal helps by not forcing PyPI to handle
  an online master signing key with all the required efforts (migration,
  rotation, mirroring, threat responses, mitigations, etc.). If you read it,
  you had seen that PyPI is only required to validate signature (like pip),
  not sign anything.
 
 
 The alternative is to just use a system implemented by several PhD
 [candidates?] in 2010 based on years of update system experience, before
 pypi security was cool. A doc from last week is a hard sell.

For one, not all PHDs follow clean implementation and automated testing 
principles.  Secondly, I appreciate Giovanni's input, work, and his offer
to help with implementation.  Let's not be too quick to dismiss it.
On a funny sidenote, he is the only one with a successfully openssl-verified 
email in these security related email threads, just saying :)

best,
holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Use user-specific site-packages by default?

2013-02-05 Thread Holger Krekel
On Tue, Feb 5, 2013 at 1:51 PM, Donald Stufft donald.stu...@gmail.comwrote:

  On Tuesday, February 5, 2013 at 5:16 AM, Lennart Regebro wrote:

  1. Packages should only be installed from the given package indexes.
 No scraping of websites as at least easy_install/buildout does, no
 downloading from external download links. A deprecation period for
 this of a couple of months, to give package authors the chance to
 upload their packages is probably necessary.

  PyPI will need to change for this to happen realistically if I recall.
 There is a
 hard limit on how large of a distribution can be uploaded to PyPI and there
 are, if I recall, valid distributions which are larger than that.



Personally I want the installers to only install from PyPI so my suggestion
 if this is something that (the proverbial) we want to do, PyPI should gain
 some notion of a soft limit for distribution upload (to prevent against
 DoS) with the ability to increase that size limit for specific projects who
 can file a ticket w/ PyPI to have their limit increased.


Dropping the crawling over external pages needs _much_ more than just a few
months deprecation warnings, rather years.   There are many packages out
there, and it would break people's installations.  As a random example,
look at http://pypi.python.org/simple/lockfile/ - it has its last release
in 2010 and 74K downloads from the 0.9 download url (going to
code.google.com).

I certainly agree, though, that the current client-side crawling is a
nuisance and makes for unreliability of installation procedures.  I think
we should move the crawling to the server side and cache packages.   I am
currently working on a prototype which does this (and a few other
niceties).  It allows to keep all installers and packages working nicely,
serving all packages from one central place (cached on demand currently but
that is a policy issue).

best,
holger


 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig


___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Use user-specific site-packages by default?

2013-02-05 Thread Holger Krekel
On Tue, Feb 5, 2013 at 2:05 PM, Jesse Noller jnol...@gmail.com wrote:



 On Feb 5, 2013, at 8:02 AM, Holger Krekel holger.kre...@gmail.com wrote:

 On Tue, Feb 5, 2013 at 1:51 PM, Donald Stufft donald.stu...@gmail.comwrote:

  On Tuesday, February 5, 2013 at 5:16 AM, Lennart Regebro wrote:

  1. Packages should only be installed from the given package indexes.
 No scraping of websites as at least easy_install/buildout does, no
 downloading from external download links. A deprecation period for
 this of a couple of months, to give package authors the chance to
 upload their packages is probably necessary.

  PyPI will need to change for this to happen realistically if I recall.
 There is a
 hard limit on how large of a distribution can be uploaded to PyPI and
 there
 are, if I recall, valid distributions which are larger than that.



 Personally I want the installers to only install from PyPI so my suggestion
 if this is something that (the proverbial) we want to do, PyPI should gain
 some notion of a soft limit for distribution upload (to prevent against
 DoS) with the ability to increase that size limit for specific projects
 who
 can file a ticket w/ PyPI to have their limit increased.


 Dropping the crawling over external pages needs _much_ more than just a
 few months deprecation warnings, rather years.   There are many packages
 out there, and it would break people's installations.  As a random example,
 look at http://pypi.python.org/simple/lockfile/ - it has its last release
 in 2010 and 74K downloads from the 0.9 download url (going to
 code.google.com).

 I certainly agree, though, that the current client-side crawling is a
 nuisance and makes for unreliability of installation procedures.  I think
 we should move the crawling to the server side and cache packages.   I am
 currently working on a prototype which does this (and a few other
 niceties).  It allows to keep all installers and packages working nicely,
 serving all packages from one central place (cached on demand currently but
 that is a policy issue).

 best,
 holger


 Derived from the current pypi code base?


No.  Using it as a reference rather, and rewritten with a TDD approach,
can't help it :)

holger





 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig


 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig


___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Use user-specific site-packages by default?

2013-02-05 Thread Holger Krekel
On Tue, Feb 5, 2013 at 2:13 PM, Lennart Regebro rege...@gmail.com wrote:

 On Tue, Feb 5, 2013 at 2:02 PM, Holger Krekel holger.kre...@gmail.com
 wrote:
  Dropping the crawling over external pages needs _much_ more than just a
 few
  months deprecation warnings, rather years.   There are many packages out
  there, and it would break people's installations.

 No it won't. Nothing gets uninstalled. What stops working is
 installing those packages with pip/easy_install. And that will start
 again as soon as the maintainer uploads the last version to PyPI,
 which she/he is likely to do quite quickly after people start
 complaining.


I wouldn't assume that maintainers are easily reachable.  I've contacted at
least three people of different (1K downloads) packages which never
responded.

And of course, i didn't mean to imply that already installed packages would
suddenly break. Rather that installation instructions like use pip install
X will just fail with some dependency Y not getting installed.  Or
getting installed in some random lower version which might contain evil
bugs (including security bugs).   For exmaple, the referenced lockfile
project has a 0.2 release on pypi, but is currently at 0.9.


  I certainly agree, though, that the current client-side crawling is a
  nuisance and makes for unreliability of installation procedures.  I
 think we
  should move the crawling to the server side and cache packages.

 That will mean that a man in the middle-attack might poison PyPI's
 cache. I don't think that's a feasible path forward.


Like i said (you snipped that part of the mail), it's a matter of policy.
Externally available packages could be downloaded at once, and not on
demand.   Such a download and checksumming could be repeated over a period
of time and from different machines.  Of course a remotely stored package
could already be compromised - but such a possibility always exists (even
if an author signs a package with PGP - his machine might be infiltrated,
or the Jenkins build systems performing automated releases etc.).

Packages does not need to be cached, as they are not supposed to
 change. If you change the package you should really release a new
 version. (Unless you made a mistake and discovered it before anyone
 actually downloaded it). So what you are proposing is really that PyPI
 downloads the package from an untrusted source, if the maintainer
 doesn't upload it. I prefer that we demand that the maintainer upload
 it.


I actually think it might make sense to forbid referencing external files
for _future_ pypi uploads (except #egg= references probably).   The
maintainer trying to do that, then gets a clear error and instructions how
to proceed.   She is just trying to get something out, so we have her
attention.

Changing pip/distribute-easy_install defaults to require an option for
installing packages coming from link rel-types of download or homepage
might make sense as well.

In the end, however, none of this prevents MITM attacks between a
downloader and pypi.python.org.  Or between the uploader and
pypi.python.org(using basic auth over http often).  Signing methods
like
https://wiki.archlinux.org/index.php/Pacman-key are key.  If a signature is
available (also at a download_url site), then we can exclude undetected
tampering.  And there might not be a need to break currently working
package releases.

It certainly makes sense to fortify python packaging and installation
procedures, but i'd like a bit more of a systematic view on it, including
reviews from security-focused people and a somewhat incremental verified
approach to turn it real and used.

best,
holger



//Lennart
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Use user-specific site-packages by default?

2013-02-05 Thread holger krekel
On Tue, Feb 05, 2013 at 15:46 +0100, Giovanni Bajo wrote:
 Il giorno 05/feb/2013, alle ore 15:06, Holger Krekel 
 holger.kre...@gmail.com ha scritto:
 
  In the end, however, none of this prevents MITM attacks between a 
  downloader and pypi.python.org.  Or between the uploader and 
  pypi.python.org (using basic auth over http often).  Signing methods like 
  https://wiki.archlinux.org/index.php/Pacman-key are key.  If a signature is 
  available (also at a download_url site), then we can exclude undetected 
  tampering.  And there might not be a need to break currently working 
  package releases. 
 
 A signature is not enough; if you don't have a secure channel,
 signatures can be replayed. Eg: if you install through an unsecure
 channel and you just verify GPG signatures on the package, I can MITM
 you and serve you an older, vulnerable package version (with its
 correct signature), and then go exploit that vulnerability.

Point taken.  I guess unless someone sits down and writes a PEP-ish path for
fortification, it's gonna be hard to assess viability and resilience
against the several attack vectors which should be sorted/prioritized.

Or is somebody on that already?  (there were hints of some background 
discussions - not sure that's helping much as most attack vectors against
the python packaging ecosystem are kind of well known or easy to guess after
a bit of research and experimentation).

best,
holger


 -- 
 Giovanni Bajo   ::  ra...@develer.com
 Develer S.r.l.  ::  http://www.develer.com
 
 My Blog: http://giovanni.bajo.it
 
 
 
 
 



 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig

___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Use user-specific site-packages by default?

2013-02-05 Thread holger krekel
On Tue, Feb 05, 2013 at 16:07 +0100, Lennart Regebro wrote:
 On Tue, Feb 5, 2013 at 3:06 PM, Holger Krekel holger.kre...@gmail.com wrote:
  I wouldn't assume that maintainers are easily reachable.  I've contacted at
  least three people of different (1K downloads) packages which never
  responded.
 
 We really can't do very much about abandoned packages.
 
  And of course, i didn't mean to imply that already installed packages would
  suddenly break. Rather that installation instructions like use pip install
  X will just fail with some dependency Y not getting installed.  Or
  getting installed in some random lower version which might contain evil bugs
  (including security bugs).   For exmaple, the referenced lockfile project
  has a 0.2 release on pypi, but is currently at 0.9.
 
 There is no way around that problem, except other people than the
 maintainers uploading the software to PyPI. That's certainly an
 option, and I have no good argument against it, but I don't like it.
 (Obviously it can only be done for software marked with relevant licenses).
 
  In the end, however, none of this prevents MITM attacks between a downloader
  and pypi.python.org.
 
 Sure, and that's another problem, and the low-hanging fruit there is
 using https.

Transporting almost all externally reachable packages to be locally pypi
served is also kind of a low hanging fruit, although probably slightly
higher hanging than SSL :)   The point is that we can have some control over
those packages once we have them - so we can delete them if they are reported
to be malicious independently of maintainer reachability. 

  If a signature is available (also at a download_url site), then we can 
  exclude undetected
  tampering.
 
 If they can change the file at the download_url site, then they surely
 can change the signature?

No, because a signature can only be created by the original author for
a particular file (his upload), not from the download site or a
MITM-attacker for a different file.

best,
holger


 //Lennart
 ___
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/mailman/listinfo/catalog-sig
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Use user-specific site-packages by default?

2013-02-05 Thread holger krekel
On Tue, Feb 05, 2013 at 10:18 -0500, Donald Stufft wrote:
 On Tuesday, February 5, 2013 at 10:14 AM, holger krekel wrote:
  Transporting almost all externally reachable packages to be locally pypi
  served is also kind of a low hanging fruit, although probably slightly
  higher hanging than SSL :) The point is that we can have some control over
  those packages once we have them - so we can delete them if they are 
  reported
  to be malicious independently of maintainer reachability.
  
 
 We have no way to validate the package we are downloading is the accurate one,
 we should not infer trust/validation that doesn't exist. 

MITM attacking any of the many world-wide pypi/easy_install downloads 
from external sites is much easier than tampering a few one-time 
downloads (verified against each other) for pypi.python.org's 
serving purposes.  By contrast, changing client-side tools and
defaults is going to take much longer and will not reach everybody.

IOW, i believe that improving the serving side good low hanging
fruit.

  No, because a signature can only be created by the original author for
  a particular file (his upload), not from the download site or a
  MITM-attacker for a different file.
  
  
 
 This assumes we know what the correct key is. If we don't then we
 have no way to validate that the signature was created by the author
 and not by someone else. Trust is hard. 

Sure, you need sig-validation infrastructure for this.

And Sig-validation is a much higher hanging fruit than using
https on pypi.python.org.

best,
holger




  
  best,
  holger
  
  
   //Lennart
   ___
   Catalog-SIG mailing list
   Catalog-SIG@python.org (mailto:Catalog-SIG@python.org)
   http://mail.python.org/mailman/listinfo/catalog-sig
   
  
  ___
  Catalog-SIG mailing list
  Catalog-SIG@python.org (mailto:Catalog-SIG@python.org)
  http://mail.python.org/mailman/listinfo/catalog-sig
  
  
 
 
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Use user-specific site-packages by default?

2013-02-05 Thread holger krekel
On Tue, Feb 05, 2013 at 15:54 -0500, Terry Reedy wrote:
 On 2/5/2013 11:35 AM, Lennart Regebro wrote:
 On Tue, Feb 5, 2013 at 5:03 PM, Donald Stufft donald.stu...@gmail.com 
 wrote:
 Besides the issues with validating that the package We are mirroring
 is the authentic one there's also a legal issue. We don't know for sure
 that we have the legal rights to redistribute those files. When you upload
 a file to PyPI you grant the PSF a license to do that, no upload from the
 author = no license. IANAL but i think i'm correct on that.
 
 Absolutely, but if the package is marked with a license that allows
 redistribution in the metadata, then we can.
 
 The last I read (and I cannot find the seemingly hidden page) the
 author (or rights-holder) of code must grant PSF something more than
 just redistribution rights before uploading it. The same must also
 certify some mumbo-jumbo about compliance with national laws and
 cryptography. No 3rd party can do that.

Not sure i understand.  Are you referring to a procedure that is in place
already or that should be in place? 

I consider the activity of caching 3rd party packages that are offered
through PyPI's metadata and which can be downloaded freely from
everwhere as similar to what web caches like squid do.  A quick scan
produced this sentence from http://en.wikipedia.org/wiki/Web_cache :

In 1998, the DMCA added rules to the United States Code (17 U.S.C.
§: 512) that relinquishes system operators from copyright liability
for the purposes of caching.

best,
holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] test pypi server?

2013-01-26 Thread Holger Krekel
Hey Chris,

according to http://pypi.python.org there should be a test pypi server at
http://testpypi.python.org/pypi but at the moment it gives 502 Bad Gateway.

cheers,
holger

On Sat, Jan 26, 2013 at 10:33 AM, Chris Withers ch...@simplistix.co.ukwrote:

 Hi All,

 I remember mention of a test PyPI server that had been set up.
 Where can I find it?

 I'm doing some automated release testing...

 Chris

 --
 Simplistix - Content Management, Batch Processing  Python Consulting
 - http://www.simplistix.co.uk
 __**_
 Catalog-SIG mailing list
 Catalog-SIG@python.org
 http://mail.python.org/**mailman/listinfo/catalog-sighttp://mail.python.org/mailman/listinfo/catalog-sig

___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


[Catalog-sig] fresh pep381run pypi-mirroring fails since 1 week

2013-01-07 Thread Holger Krekel
Hi all,

During the last 7 days i tried running pep381run with a fresh directory
on two different hosts.   They both failed while trying to copy
azb_nester-1.2.0.tar.gz, see here for the traceback:
http://bpaste.net/show/SoMoyjdJEIGvm99dH6gG/  It seems that azb_nester does
not have any files anymore on pypi.python.org, they probably got deleted.

Is that a bug in the pep381run software or in the pep381 mirroring protocol
or ...?

best,
holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


[Catalog-sig] disabling the serving of links from description_html?

2012-12-18 Thread Holger Krekel
Hi Richard, hi all,

While reading the pypi main and other sources i wondered how we could
switch off serving links from description_html, at least on a per-project
basis.  It's really annoying that when you start to add some links to a
long_description that installation of your package will thus slow down
around the world.  Even if you remove the links from the next release.

How could we arrange for a maintainer to communicate to the pypi-server
that a particular project should not ever serve links from description_html
(and maybe not even from the homepage while we are at it)?

Preferably it should be something that can be done from existing setup.py
files, like adding a special trove-classifier or keyword.  But a little
custom tool or a web page form would be ok as well.

If maintainers could easily switch off these extra links, then this means
less stress for the pypi server and a global considerable speedup of
installing python packages as often most of the pip/easy_install time is
spent with checking out these URLs.

best,
holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] disabling the serving of links from description_html?

2012-12-18 Thread Holger Krekel
On Tue, Dec 18, 2012 at 5:46 PM, M.-A. Lemburg m...@egenix.com wrote:

 On 18.12.2012 15:54, Holger Krekel wrote:
  Hi Richard, hi all,
 
  While reading the pypi main and other sources i wondered how we could
  switch off serving links from description_html, at least on a per-project
  basis.  It's really annoying that when you start to add some links to a
  long_description that installation of your package will thus slow down
  around the world.  Even if you remove the links from the next release.
 
  How could we arrange for a maintainer to communicate to the pypi-server
  that a particular project should not ever serve links from
 description_html
  (and maybe not even from the homepage while we are at it)?
 
  Preferably it should be something that can be done from existing setup.py
  files, like adding a special trove-classifier or keyword.  But a little
  custom tool or a web page form would be ok as well.
 
  If maintainers could easily switch off these extra links, then this means
  less stress for the pypi server and a global considerable speedup of
  installing python packages as often most of the pip/easy_install time is
  spent with checking out these URLs.

 Are you sure about about this ?

 AFAIK, setuptools/distribute only looks at links with rel=homepage
 or rel=download attributes, not all links on the PyPI project page.
 The links from the description don't receive such attributes.

 See e.g. http://pypi.python.org/simple/pytest/


You are right, Marc.  Only the download and home page links (from all
versions ever published) are considered from pip/easy_install.  I just
examined it more closely via urlsnarf.  They were so many in some projects
and mixed with the other links so i didn't see it clearly before (although
i did notice the rel classification).

So to avoid the overhead one could retroactively remove all download links
and maybe also all homepage links except the one for the latest version or
so.   But that can be done without changes to pypi itself i guess.

best  thanks for the clarification,
holger


 --
 Marc-Andre Lemburg
 eGenix.com

 Professional Python Services directly from the Source  (#1, Dec 18 2012)
  Python Projects, Consulting and Support ...   http://www.egenix.com/
  mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
  mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/
 
 2012-12-14: Released mxODBC.Connect 2.0.2 ... http://egenix.com/go38
 2012-12-05: Released eGenix pyOpenSSL 0.13 ...http://egenix.com/go37
 2013-01-22: Python Meeting Duesseldorf ... 35 days to go

eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/

___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


[Catalog-sig] current repo of pypi

2012-11-30 Thread Holger Krekel
Hello,

The http://wiki.python.org/moin/CheeseShopDev page mentioned that the repo
is undergoing migration.  Is there some (even intermediate) url which i
could pull today?

thanks,
holger
___
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig


Re: [Catalog-sig] Perhaps PyPI will do

2005-04-07 Thread holger krekel
Hi David, 

On Thu, Apr 07, 2005 at 09:32 -0700, David Ascher wrote:
 I find the discussion depressing in many ways.

Did i miss some of the discussion?  At least on catalog-sig
and in the blogs it was going quite ok in my opionion. 
But maybe we had different expectations :-) 

holger
___
Catalog-sig mailing list
Catalog-sig@python.org
http://mail.python.org/mailman/listinfo/catalog-sig