On Wed, Feb 27, 2013 at 16:16 -0700, Aaron Meurer wrote:
> And by the way, this hasn't been mentioned, but I really mean *all*
> mentions of Google Code on PyPI.  pip crawls Google Code not just
> because Google Code listed as an official site for my package or
> because the latest release is there, but because a single old release
> points there.  So to get pip to not crawl there, I would have to go
> through and remove all old mentions of Google Code, even from releases
> that were made in 2006.  So you can see why the expired domain
> scenario is a very real issue. And combined with the fact that
> everyone uses pip with sudo that was discussed on this list a while
> back, you have a hackers dream for installing malicious code on
> everyone's computers.

I wrote a little command line tool "cleanpypi.py" for the
purposes of removing _all_ download/homepage metadata from all releases
of a project.  See it attached - WARNING: it's a hack but worked for me.
It uses the xmlrpc interface to get release data and then the SUBMIT POST
hook to register the new "cleaned up" metadata.  If you want to 
play with it, you might comment the final "req.post" request so that 
no actual changes take place and you can see what it would do.

Apart from preventing hijacking old download/homepage-referenced domains
it has the nice side effect that it speeds up the installation of your
package because no 3rd party crawling needs to take place.  Given some
streamlining, a tool like this could be advertised on pypi.python.org or
offered directly as an action in the server UI for package authors.

best,

holger
import py
import sys
import xmlrpclib
import requests

def ver_and_metadata(pkg):
    proxy = xmlrpclib.Server("https://pypi.python.org/pypi";)
    for version in proxy.package_releases(pkg, True):
        metadata = proxy.release_data(pkg, version)
        yield version, metadata

def cleanup_metadata(metadata):
    from devpi.server.db import metadata_keys
    #for name in list(metadata):
    #    if name not in metadata_keys13:
    #        del metadata[name]
    #        print "removing", name
    metadata["metadata_version"] = "1.3"
    for name in metadata_keys:
        assert name in metadata, name


if __name__ == "__main__":
    pkg = sys.argv[1]
    #homepage = download = sys.argv[2]

    req = requests.session()

    for version, metadata in ver_and_metadata(pkg):
        oldhomepage = metadata.get("home_page", None)
        olddownload = metadata.get("download_url", None)
        print version, oldhomepage, olddownload
        changed = False
        if oldhomepage and oldhomepage != "UNKNOWN":
            metadata["home_page"] = "UNKNOWN"
            print "clearing homepage (found: %r)" % oldhomepage
            changed = True
        if olddownload and olddownload != "UNKNOWN":
            metadata["download_url"] = "UNKNOWN"
            print "clearing download_url (found: %r)" % olddownload
            changed = True

        if not changed:
            print pkg, version, "ok - no changes needed"
        cleanup_metadata(metadata)
        metadata[":action"] = "submit"
        #py.std.pprint.pprint(metadata)
        #response = req.post("https://pypi.python.org/";, data=metadata,
        #response = req.post("http://requestb.in/msu5mkms";,
        response = req.post("https://pypi.python.org/pypi";,
                            data=metadata,
                            auth=(USER, PASS))
        print pkg, version, "metadata cleaned", response.status_code

_______________________________________________
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig

Reply via email to