Hi all, I was running weblinkchecker.py for whole cswiki (job was submited to the grid at Sun, 20 Nov 2016 16:54:24 GMT) because I wished to have a list of deadlinks. This may correspond with the UA (because I used script named weblinkschecker.py). I trusted this script it won't do anything wrong because this script was and still is in standard core package. I also use 3.0-dev version of pywikibot and Python 2.7.6.
But this job was completed already so if those GET requests didn't stop I'm not the cause. Or I lost access to the job, qstat at all my tools (urbanecmbot, missingpages) and my personal account (urbanecm) is empty/show only webserver. If I was the cause, I'm very sorry for it. As I said I didn't know the script does not throttle GET requests enoguh. Also minorplanetcenter.net is inserted only in 22 articles (as https://cs.wikipedia.org/w/index.php?search=insource%3Aminorplanetcenter.net&title=Speci%C3%A1ln%C3%AD:Hled%C3%A1n%C3%AD&go=J%C3%ADt+na&searchToken=507gqzzqk3eyplk5s6gsii2bv says) so it shouldn't be so massive as there is said. My .bash_history says the following. I guess 1479660864 is Unix timestamp, human time is Sun, 20 Nov 2016 16:54:24 GMT. #1479660864 jsub -l release=trusty python ~/pwb/scripts/weblinkchecker.py -start:! My user-config.py is at http://pastebin.com/cUAwQuWt, without OAUTH. Complete user-config is at /home/urbanecm/.pywikibot/user-config.py and only roots can see it. Again, if I was the cause, I'm sorry for it. I only used standard scripts and I trusted them that they works correctly. Martin Urbanec alias Urbanecm https://cs.wikipedia.org/wiki/Wikipedista:Martin_Urbanec https://meta.wikimedia.org/wiki/User:Martin_Urbanec https://wikitech.wikimedia.org/wiki/User:Urbanecm ne 4. 12. 2016 v 18:03 odesílatel Maximilian Doerr < [email protected]> napsal: > https://phabricator.wikimedia.org/F4978348 Done. > > Cyberpower678 > English Wikipedia Account Creation Team > ACC Mailing List Moderator > Global User Renamer > > On Dec 4, 2016, at 11:49, Merlijn van Deen (valhallasw) < > [email protected]> wrote: > > Hi Maximilian, > > https://phabricator.wikimedia.org/file/upload/ allows you to specify > 'Visible to'. You can select 'Custom policy' and select the relevant users, > i.e. > <image.png> > > In the meanwhile, I'll try to figure out if I can get some information > from netstat. > > Cheers, > Merlijn > > On 4 December 2016 at 17:36, Maximilian Doerr <[email protected]> > wrote: > > Sure, how would I be able to restrict it’s visibility? Harvard is kind > enough to unblock, if the culprit is stopped. > > > > As for exact URLs, it’s the entire domains owned by Harvard. But the > access log can provide specifics. The Python script is attempting to get > all 140,000 pieces of data about minor planets from > www.minorplanetcenter.net according to IT, who also claims that such an > action the way being done now would severely tie up their servers for quite > a while, which they cannot afford. > > > > Cyberpower678 > > English Wikipedia Account Creation Team > > Mailing List Moderator > > Global User Renamer > > > > *From:* Merlijn van Deen (valhallasw) [mailto:[email protected]] > *Sent:* Sunday, December 4, 2016 10:59 > *To:* [email protected] > *Subject:* Re: [Labs-l] Some using a Python framework is relentlessly > hammering Harvard sites, resulting an IP range ban. > > > > Hi Maximilian, > > > > On 4 December 2016 at 05:51, Maximilian Doerr <[email protected]> > wrote: > > Would the user who is querying the Harvard sites for planet data, that is > carrying the UA “weblinkchecker Pywikibot/3.0-dev (g7171) requests/2.2.1 > Python/2.7.6.final.0”, please stop, or severely throttle the GET requests. > It’s making 168 requests to that site a minute, and consequently they > banned labs from accessing it, according to the IT department there, who > kindly shared with me the access log. > > > > > > Would you be able to share the access log with the Tools admins (say, via > Phabricator, only shared to Yuvi, Bryan Davis, Andrew Bogott, Chase, scfc > and me)? From the combination of external IP and timestamp we may be able > to pinpoint which tool was causing this. > > > > Can you also clarify which exact URLs we are talking about? > > > > Cheers, > > Merlijn > > > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l >
_______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
