On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote:
> On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote:
> > On 01.03.2013 11:19, holger krekel wrote:
> > > Hi Richard, all,
> > > 
> > > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py"
> > > script which takes a project name as an argument and then goes to 
> > > pypi.python.org (http://pypi.python.org) and removes all 
> > > homepage/download metadata entries for 
> > > this project. This sanitizes/speeds up installation because
> > > pip/easy_install don't need to crawl them anymore. I just did this for
> > > three of my projects, (pytest, tox and py) and it seems to work fine.
> > > 
> > 
> > 
> > Does it also cleanup the links that PyPI adds to the /simple/ by
> > parsing the project description for links ?
> > 
> > I think those are far nastier than the homepage and download links,
> > which can be put to some good use to limit the external lookups
> > (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal)
> > 
> > See e.g. https://pypi.python.org/simple/zc.buildout/
> > for a good example of the mess this generates... even mailto links
> > get listed and "file:///" links open up the installers for all
> > kinds of nasty things (unless they explicitly protect against
> > following these).
> > 
> > 
> 
> pip at least, and I assume the other tools don't spider those links, but
> they do consider them for download (e.g. if the link looks installable
> it will be a candidate for installing, but  it won't fetch it, and look for 
> more links like it will donwnload_url/home_page).
> 
> I believe that's the way it's structured atm.

That's right. Even though the long-description extracted links 
look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything
with them except if the "href" ends in "#egg=PKGNAME-" in which case they are
taken as pointing to a development tarball (e.g. at github or bitbucket).
ASFAIK a link like "PKGNAME-VER.tar.gz" will not be treated as
an installation candidate, just the "#egg=PKGNAME" one.

best,
holger


> > 
> > > Now before i release this as a tool, i wonder: Is it a good idea to remove
> > > download/homepage entries? Is there any current machine use (other than
> > > the dreaded crawling) for the homepage/download_url per-release metadata 
> > > fields?
> > > 
> > > For humans the homepage link is nicely discoverable if the 
> > > long-description
> > > doesn't mention it prominently. But i think there also is a "project url" 
> > > or "bugtrack url" for a project so maybe those could be used to reference 
> > > these important pages? (i am a bit confused on the exact meaning of those
> > > urls, btw).
> > > 
> > > Should we maybe stop advertising "homepage" and "download_url"
> > > and instead see to extend project-url/bugtrackurl to be used
> > > and shown nicely? The latter are independent of releases which i think
> > > makes sense - what use are old probably unreachable/borked homepages
> > > anyway. And it's also not too bad having to go once to pypi.python.org 
> > > (http://pypi.python.org)
> > > to set it, usually it seldomly changes.
> > > 
> > 
> > 
> > I think it would be better to differentiate between showing the
> > fields on the project pages, where they provide useful resources
> > for people, and their use on the /simple/ index pages which are
> > meant for programs to parse.
> > 
> > IMO, the homepage and download links on the project pages are
> > indeed very useful for people. On the /simple/ index a homepage
> > link is probably not all that useful (provided a download link
> > is set). The download links serve the purpose of directing
> > tools to the right location, so those do belong on the /simple/
> > index listings. I'd completely remove the links parsed from
> > the descriptions, since those don't really provide a good
> > basis for crawling (the description is meant for humans to
> > parse, not programs).
> > 
> > -- 
> > Marc-Andre Lemburg
> > eGenix.com (http://eGenix.com)
> > 
> > Professional Python Services directly from the Source (#1, Mar 01 2013)
> > > > > Python Projects, Consulting and Support ... http://www.egenix.com/
> > > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/
> > > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
> > > > > 
> > > > 
> > > 
> > 
> > ________________________________________________________________________
> > 
> > ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> > 
> > eGenix.com (http://eGenix.com) Software, Skills and Services GmbH 
> > Pastor-Loeh-Str.48
> > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
> > Registered at Amtsgericht Duesseldorf: HRB 46611
> > http://www.egenix.com/company/contact/
> > _______________________________________________
> > Catalog-SIG mailing list
> > Catalog-SIG@python.org (mailto:Catalog-SIG@python.org)
> > http://mail.python.org/mailman/listinfo/catalog-sig
> > 
> > 
> 
> 
_______________________________________________
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig

Reply via email to