On Wed, Feb 27, 2013 at 16:16 -0700, Aaron Meurer wrote: > And by the way, this hasn't been mentioned, but I really mean *all* > mentions of Google Code on PyPI. pip crawls Google Code not just > because Google Code listed as an official site for my package or > because the latest release is there, but because a single old release > points there. So to get pip to not crawl there, I would have to go > through and remove all old mentions of Google Code, even from releases > that were made in 2006. So you can see why the expired domain > scenario is a very real issue. And combined with the fact that > everyone uses pip with sudo that was discussed on this list a while > back, you have a hackers dream for installing malicious code on > everyone's computers.
I wrote a little command line tool "cleanpypi.py" for the purposes of removing _all_ download/homepage metadata from all releases of a project. See it attached - WARNING: it's a hack but worked for me. It uses the xmlrpc interface to get release data and then the SUBMIT POST hook to register the new "cleaned up" metadata. If you want to play with it, you might comment the final "req.post" request so that no actual changes take place and you can see what it would do. Apart from preventing hijacking old download/homepage-referenced domains it has the nice side effect that it speeds up the installation of your package because no 3rd party crawling needs to take place. Given some streamlining, a tool like this could be advertised on pypi.python.org or offered directly as an action in the server UI for package authors. best, holger
import py import sys import xmlrpclib import requests def ver_and_metadata(pkg): proxy = xmlrpclib.Server("https://pypi.python.org/pypi") for version in proxy.package_releases(pkg, True): metadata = proxy.release_data(pkg, version) yield version, metadata def cleanup_metadata(metadata): from devpi.server.db import metadata_keys #for name in list(metadata): # if name not in metadata_keys13: # del metadata[name] # print "removing", name metadata["metadata_version"] = "1.3" for name in metadata_keys: assert name in metadata, name if __name__ == "__main__": pkg = sys.argv[1] #homepage = download = sys.argv[2] req = requests.session() for version, metadata in ver_and_metadata(pkg): oldhomepage = metadata.get("home_page", None) olddownload = metadata.get("download_url", None) print version, oldhomepage, olddownload changed = False if oldhomepage and oldhomepage != "UNKNOWN": metadata["home_page"] = "UNKNOWN" print "clearing homepage (found: %r)" % oldhomepage changed = True if olddownload and olddownload != "UNKNOWN": metadata["download_url"] = "UNKNOWN" print "clearing download_url (found: %r)" % olddownload changed = True if not changed: print pkg, version, "ok - no changes needed" cleanup_metadata(metadata) metadata[":action"] = "submit" #py.std.pprint.pprint(metadata) #response = req.post("https://pypi.python.org/", data=metadata, #response = req.post("http://requestb.in/msu5mkms", response = req.post("https://pypi.python.org/pypi", data=metadata, auth=(USER, PASS)) print pkg, version, "metadata cleaned", response.status_code
_______________________________________________ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig