I doubt it was that then, if you only scanned 22.  This user according to IT 
was attempting to fetch all 140,000 pieces of data about minor planets and was 
making 160 requests to that site a minute, which was severely bogging their 
servers when combined with the load they already have.  I think the ban was put 
into effect on Nov. 2.

Maybe it would be wise to have labs simply throttle consecutive outgoing 
connections from tool labs, if possible.  That is connections being made from 
scripts to external sites, while maintaining statue quo with the webservices.  
This has to have some kind of impact on IO Network bandwidth usage for both 
host and client servers.

Cyberpower678
English Wikipedia Account Creation Team
ACC Mailing List Moderator
Global User Renamer

> On Dec 4, 2016, at 12:29, Martin Urbanec <[email protected]> wrote:
> 
> Hi all, 
> I was running weblinkchecker.py for whole cswiki (job was submited to the 
> grid at Sun, 20 Nov 2016 16:54:24 GMT) because I wished to have a list of 
> deadlinks. This may correspond with the UA (because I used script named 
> weblinkschecker.py). I trusted this script it won't do anything wrong because 
> this script was and still is in standard core package. I also use 3.0-dev 
> version of pywikibot and Python 2.7.6. 
> 
> But this job was completed already so if those GET requests didn't stop I'm 
> not the cause. Or I lost access to the job, qstat at all my tools 
> (urbanecmbot, missingpages) and my personal account (urbanecm) is empty/show 
> only webserver. 
> 
> If I was the cause, I'm very sorry for it. As I said I didn't know the script 
> does not throttle GET requests enoguh. 
> 
> Also minorplanetcenter.net <http://minorplanetcenter.net/> is inserted only 
> in 22 articles (as 
> https://cs.wikipedia.org/w/index.php?search=insource%3Aminorplanetcenter.net&title=Speci%C3%A1ln%C3%AD:Hled%C3%A1n%C3%AD&go=J%C3%ADt+na&searchToken=507gqzzqk3eyplk5s6gsii2bv
>  
> <https://cs.wikipedia.org/w/index.php?search=insource%3Aminorplanetcenter.net&title=Speci%C3%A1ln%C3%AD:Hled%C3%A1n%C3%AD&go=J%C3%ADt+na&searchToken=507gqzzqk3eyplk5s6gsii2bv>
>  says) so it shouldn't be so massive as there is said. 
> 
> My .bash_history says the following. I guess 1479660864 is Unix timestamp, 
> human time is Sun, 20 Nov 2016 16:54:24 GMT. 
> 
> #1479660864
> jsub -l release=trusty python ~/pwb/scripts/weblinkchecker.py -start:!
> 
> My user-config.py is at http://pastebin.com/cUAwQuWt 
> <http://pastebin.com/cUAwQuWt>, without OAUTH. Complete user-config is at 
> /home/urbanecm/.pywikibot/user-config.py and only roots can see it. 
> 
> Again, if I was the cause, I'm sorry for it. I only used standard scripts and 
> I trusted them that they works correctly. 
> 
> Martin Urbanec alias Urbanecm
> https://cs.wikipedia.org/wiki/Wikipedista:Martin_Urbanec 
> <https://cs.wikipedia.org/wiki/Wikipedista:Martin_Urbanec>
> https://meta.wikimedia.org/wiki/User:Martin_Urbanec 
> <https://meta.wikimedia.org/wiki/User:Martin_Urbanec>
> https://wikitech.wikimedia.org/wiki/User:Urbanecm 
> <https://wikitech.wikimedia.org/wiki/User:Urbanecm>
> 
> ne 4. 12. 2016 v 18:03 odesílatel Maximilian Doerr 
> <[email protected] <mailto:[email protected]>> napsal:
> https://phabricator.wikimedia.org/F4978348 
> <https://phabricator.wikimedia.org/F4978348> Done.
> 
> Cyberpower678
> English Wikipedia Account Creation Team
> ACC Mailing List Moderator
> Global User Renamer
> 
>> On Dec 4, 2016, at 11:49, Merlijn van Deen (valhallasw) 
>> <[email protected] <mailto:[email protected]>> wrote:
>> 
>> Hi Maximilian,
>> 
>> https://phabricator.wikimedia.org/file/upload/ 
>> <https://phabricator.wikimedia.org/file/upload/> allows you to specify 
>> 'Visible to'. You can select 'Custom policy' and select the relevant users, 
>> i.e.
>> <image.png>
>> 
>> In the meanwhile, I'll try to figure out if I can get some information from 
>> netstat.
>> 
>> Cheers,
>> Merlijn
>> 
>> On 4 December 2016 at 17:36, Maximilian Doerr <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Sure, how would I be able to restrict it’s visibility?  Harvard is kind 
>> enough to unblock, if the culprit is stopped.
>> 
>>  
>> 
>> As for exact URLs, it’s the entire domains owned by Harvard.  But the access 
>> log can provide specifics.  The Python script is attempting to get all 
>> 140,000 pieces of data about minor planets from www.minorplanetcenter.net 
>> <http://www.minorplanetcenter.net/> according to IT, who also claims that 
>> such an action the way being done now would severely tie up their servers 
>> for quite a while, which they cannot afford.
>> 
> 
>>  
>> 
>> Cyberpower678
>> 
>> English Wikipedia Account Creation Team
>> 
>> Mailing List Moderator
>> 
>> Global User Renamer
>> 
>>  
>> 
> 
>> From: Merlijn van Deen (valhallasw) [mailto:[email protected] 
>> <mailto:[email protected]>] 
>> Sent: Sunday, December 4, 2016 10:59
>> To: [email protected] <mailto:[email protected]>
>> Subject: Re: [Labs-l] Some using a Python framework is relentlessly 
>> hammering Harvard sites, resulting an IP range ban.
>> 
>>  
>> 
>> Hi Maximilian,
>> 
>>  
>> 
> 
>> On 4 December 2016 at 05:51, Maximilian Doerr <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Would the user who is querying the Harvard sites for planet data, that is 
>> carrying the UA “weblinkchecker Pywikibot/3.0-dev (g7171) requests/2.2.1 
>> Python/2.7.6.final.0”, please stop, or severely throttle the GET requests.  
>> It’s making 168 requests to that site a minute, and consequently they banned 
>> labs from accessing it, according to the IT department there, who kindly 
>> shared with me the access log.
>> 
>>  
>> 
>>  
>> 
> 
>> Would you be able to share the access log with the Tools admins (say, via 
>> Phabricator, only shared to Yuvi, Bryan Davis, Andrew Bogott, Chase, scfc 
>> and me)? From the combination of external IP and timestamp we may be able to 
>> pinpoint which tool was causing this.
>> 
>>  
>> 
>> Can you also clarify which exact URLs we are talking about?
>> 
>>  
>> 
>> Cheers,
>> 
>> Merlijn
>> 
>> 
> 
> _______________________________________________
> Labs-l mailing list
> [email protected] <mailto:[email protected]>
> https://lists.wikimedia.org/mailman/listinfo/labs-l 
> <https://lists.wikimedia.org/mailman/listinfo/labs-l>
> _______________________________________________
> Labs-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/labs-l

_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Reply via email to