2013/7/25 Andre Klapper <[email protected]>

Mozilla have recommendations for researchers at
> https://bugzilla.mozilla.org/page.cgi?id=researchers.html
> and offer a sanitized MySQL dump (without attachments and secret
> tickets) at http://people.mozilla.com/~mhoye/bugzilla/ .
> Would it be worth if I asked Mozilla for steps how to create such a
> dump?


Sure, we can ask them for a way to build a dump without secret tickets or
private data. Will you go ahead and do that?


> For the time being that researchers crawl GNOME Bugzilla and that we
> don't have a dump:
> What would be acceptable latency values to *not* get IP addresses
> blocked, and UTC times of the day where there's less traffic anyway?
> (Actually I'm asking this on behalf of a university professor.)
>

Currently, all requests that exceed the amount of 1500 hits per hour get
banned (an hit means an entry on the relevant apache log in the format "IP
date GET PATH"). We had a few cases of people not keeping a cache of the
static html / css files that resulted in a ban after a few minutes cause
their browser requesting the same static files at each request.

What we can do now is adding a few exceptions to the htaccess file that
gets populated by our banning script. That said most of the GNOME
developers are either from EU (mainly GMT+1) or from the eastern coast of
the US (GMT-5), so I would say any time between 1-2 o'clock AM to 7-8 AM.
We should probably ask these researchers to don't crawl the website at the
same time if they plan to do so in the future, maybe limiting them to one
per night.

Someone else might have another better idea though ;)

-- 
Cheers,

Andrea

Debian Developer,
Fedora / EPEL packager,
GNOME Sysadmin,
GNOME Foundation Membership & Elections Committee Chairman

Homepage: http://www.gnome.org/~av
_______________________________________________
gnome-infrastructure mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/gnome-infrastructure

Reply via email to