Hi, I sent the following emails to the github list, but they never
seem to got in. If they did, then I apologize for resending them.

Essentially I offer a workaround to retrieve a list of authors of a
repository using the google appengine.

Ondrej


On Sat, Jul 18, 2009 at 12:43 PM, Ondrej Certik<[email protected]> wrote:
> On Sat, Jul 18, 2009 at 2:10 AM, Ondrej Certik<[email protected]> wrote:
>> Hi,
>>
>> is there some way to obtain a list of authors, like in this command:
>>
>> $ git shortlog -ns
>>   923  Ondrej Certik
>>   374  Kirill Smelkov
>>   257  Mateusz Paprocki
>>   109  Fredrik Johansson
>>   102  Fabian Pedregosa
>>    55  Jason Gedge
>> [...]
>>
>>
>> using the GitHub API? The only way I figured so far is to use the
>> Network API
>>
>> http://develop.github.com/p/network.html
>>
>> to retrieve *all* commits (e.g. first uses "dates" to get the range
>> and then "network_data_chunk" for all commits), then extract author
>> information from it. It's pretty wasteful and quite slow too.
>>
>> Is there some other way?
>
> Here is a python script that does that:
>
> --------
> from django.utils import simplejson
> import urllib2
>
> s = urllib2.urlopen("http://github.com/certik/sympy/network_meta";).read()
> data = simplejson.loads(s)
> dates = data["dates"]
> nethash = data["nethash"]
> print len(dates)
> print nethash
> base = "http://github.com/certik/sympy";
> url = "%s/network_data_chunk?nethash=%s&start=0&end=%d" % (base, nethash,
>        len(dates)-1)
> print "downloading..."
> s = urllib2.urlopen(url).read()
> print "   done."
> data = simplejson.loads(s, encoding="latin-1")
> commits = data["commits"]
> authors = [x["author"] for x in commits]
> authors = list(set(authors))
> authors.sort()
> print authors
> print len(authors)
> ----------
>
> this prints:
>
>
> $ python a.py
> 2819
> c55c72cf04eda4b54e26ef4bc30881a97de59e3e
> downloading...
>   done.
> [u'Aaron Meurer', u'Abderrahim Kitouni', u'Akshay Srinivasan', u'Alan
> Bromborsky', u'Ali Raza Syed', u'Andrej "qwp0" Tokar\u010d\xc3\xadk',
> u'Andrew Docherty', u'Andrew Straw', u'Andy R. Terrel', u'Barry
> Wardell', u'Bastian Weber', u'Ben Goodrich', u'Bernhard R. Link',
> u'Boris Timokhin', u'Brian E. Granger', u'Chris Smith', u'Chris.Wu',
> u'David Lawrence', u'David Marek', u'David Roberts', u'David Roberts
> (dvdr18 [at] gmail [dot] com)', u'Elrond der Elbenfuerst', u'Fabian
> Pedregosa', u'Fabian Seoane', u'Felix Kaiser', u'Florian Mickler',
> u'Freddie Witherden', u'Fredrik', u'Fredrik Johansson', u'Friedrich
> Hagedorn', u'Goutham', u'Henrik Johansson', u'Hubert Tsang', u'James
> Abbatiello', u'James Aspnes', u'Jaroslaw Tworek', u'Jochen Voss',
> u'Johann Cohen-Tanugi', u'Jurjen N.E. Bos', u'Kaifeng Zhu', u'Kirill
> Smelkov', u'Konrad Meyer', u'Luke Peterson', u'Mateusz Paprocki',
> u'Nicolas Pourcelot', u'Nimish Telang', u'Ondrej Certik', u'Or Dvory',
> u'Pan Peng', u'Pauli Virtanen', u'Priit Laes', u'Riccardo Gori',
> u'RizgarMella [email protected]', u'Robert', u'Robert Cimrman',
> u'Robert Kern', u'Roberto Nobrega', u'Ronan Lamy', u'Ryan Krauss',
> u'Saroj', u'Saroj Adhikari', u'Sebastian Krause', u'Sebastian Kreft',
> u'Sebastian Kr\xc3\xa4mer', u'Stefano Maggiolo', u'Stepan Roucka',
> u'Ted Horst', u'Thomas Sidoti', u'Tomasz Buchert', u'Toon
> Verstraelen', u'Vinay Kumar', u'Vinzent Steinberg', u'basti.kr',
> u'brian.jorgensen', u'certik', u'convert-repo', u'fabian',
> u'fredrik.johansson', u'inferno1386', u'kirill.smelkov', u'lethargo',
> u'mattpap', u'ondrej.certik', u'pearu.peterson']
> 84
>
>
>
> However, if I wanted to also get email addresses, I think I'd have to
> go over all users individually, probably use the commit ID to get to
> the author of the commit using GitHub API.

Any ideas on this? I have implemented the above approach here:

http://repos.sympy.org/

and it seems to be working, e.g.:

http://repos.sympy.org/hooks/repos/agZzeW1weTJyEQsSClJlcG9zaXRvcnkYwRIM/

I am using the appengine's task queue and I am restricting github API
calls to 55 per minute, to be sure I don't break the 60 requests per
minute limit.

But obviously, if the same thing could be achieved by just one API
call (I don't know), it'd be much less wasteful.

Ondrej

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"GitHub" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/github?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to