On Thu, May 16, 2013 at 3:46 PM, David Wilson <[email protected]> wrote:
> Would something like http://pypi.h1.botanicus.net/static/dump.txt.gz be
> useful to you? (warning: 57mb expanding to 540mb). Each line is a
> JSON-encoded dict containing a single package release.
>
> for line in gzip.open('dump.txt.gz'):
>     dct = json.loads(line)
>     ....
>
> etc
>
> The code for it is very simple, would be willing to clean it up and turn it
> into a cron job if people found it useful.
>
> Note the dump above is outdated, I only made it as a test.

Seems like a useful format.

https://bitbucket.org/dholth/pypi_stats is a prototype that parses
requires.txt and other metadata out of all the sdists in a folder,
putting them into a sqlite3 database. It may be interesting for
experimentation. For example, I can easily tell you how many different
version numbers there are and which are the most popular, or I can
tell you which metadata keys and version numbers have been used. The
database winds up being 1.6 GB or about 200MB if you delete the
unparsed files.
_______________________________________________
Distutils-SIG maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to