I’m not really competent to understand the technical aspect, if it can help, here and exemple of the request who were actually done to our server when all job were failing:
208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:17:23 +0200] "HEAD /penard/Collection_Penard_MHNG_Specimen_343-5-5.tif HTTP/1.1" 200 0 "-" "MediaWiki/1.26wmf7 GWToolset/0.3.8" 208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:17:28 +0200] "HEAD /penard/Collection_Penard_MHNG_Specimen_344-1-2.tif HTTP/1.1" 200 0 "-" "MediaWiki/1.26wmf7 GWToolset/0.3.8" 208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:17:29 +0200] "HEAD /penard/Collection_Penard_MHNG_Specimen_344-2-2.tif HTTP/1.1" 200 0 "-" "MediaWiki/1.26wmf7 GWToolset/0.3.8" 208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:17:31 +0200] "HEAD /penard/Collection_Penard_MHNG_Specimen_344-1-1.tif HTTP/1.1" 200 0 "-" "MediaWiki/1.26wmf7 GWToolset/0.3.8" 208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:17:38 +0200] "HEAD /penard/Collection_Penard_MHNG_Specimen_344-1-3.tif HTTP/1.1" 200 0 "-" "MediaWiki/1.26wmf7 GWToolset/0.3.8" 208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:17:41 +0200] "GET /penard/Collection_Penard_MHNG_Specimen_339-17-3.tif HTTP/1.1" 200 3322941 "-" "MediaWiki/1.26wmf7" 208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:17:41 +0200] "HEAD /penard/Collection_Penard_MHNG_Specimen_344-2-3.tif HTTP/1.1" 200 0 "-" "MediaWiki/1.26wmf7 GWToolset/0.3.8" 208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:17:50 +0200] "GET /penard/Collection_Penard_MHNG_Specimen_345-4-2.tif HTTP/1.1" 200 867440 "-" "MediaWiki/1.26wmf7" 208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:17:52 +0200] "GET /penard/Collection_Penard_MHNG_Specimen_342-2-1.tif HTTP/1.1" 200 1837688 "-" "MediaWiki/1.26wmf7" 208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:17:55 +0200] "GET /penard/Collection_Penard_MHNG_Specimen_342-3-1.tif HTTP/1.1" 200 1195016 "-" "MediaWiki/1.26wmf7" 208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:18:01 +0200] "HEAD /penard/Collection_Penard_MHNG_Specimen_341-2-2.tif HTTP/1.1" 200 0 "-" "MediaWiki/1.26wmf7 GWToolset/0.3.8" 208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:18:02 +0200] "HEAD /penard/Collection_Penard_MHNG_Specimen_341-2-3.tif HTTP/1.1" 200 0 "-" "MediaWiki/1.26wmf7 GWToolset/0.3.8" 208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:18:04 +0200] "HEAD /penard/Collection_Penard_MHNG_Specimen_344-2-4.tif HTTP/1.1" 200 0 "-" "MediaWiki/1.26wmf7 GWToolset/0.3.8" 208.80.154.156 lan.wikimedia.ch <http://lan.wikimedia.ch/> - [29/May/2015:13:18:07 +0200] "GET /penard/Collection_Penard_MHNG_Specimen_366-2-2.tif HTTP/1.1" 200 1035100 "-" "MediaWiki/1.26wmf7 » ___________________________________________________________ Charles ANDRES, Chief Science Officer "Wikimedia CH" – Association for the advancement of free knowledge – www.wikimedia.ch <http://www.wikimedia.ch/> Office +41 (0)21 340 66 21 Mobile +41 (0)78 910 00 97 Skype: charles.andres.wmch IRC://irc.freenode.net/wikimedia-ch <irc://irc.freenode.net/wikimedia-ch> http://prezi.com/user/Andrescharles/ > Le 3 juin 2015 à 16:10, Brian Wolff <[email protected]> a écrit : > > On 6/3/15, Charles Andrès <[email protected]> wrote: >> >> >>> >>> Out of interest, how many processing threads were chosen in GWT for >>> the job? It may be an idea if the input page is changed to default to >>> 2 threads and there are warnings if you have more than 8 or so. I can >>> imagine 20 processing threads causing a server issue for large files >>> and in practice I used 4 or 5 for my largest upload jobs; probably >>> something to usefully add to the user guide. >>> >> >> I kept the by default setting of 5. My guess is that when our server get >> overloaded, GWToolset start making more request, those it would have done >> natural, and repeating those who were failing, but it’s just my guess. >> >> >> Charles >> > > Hmm, I'm not sure if it would try multiple times for requests that > fail, but in terms of things logged, that does not appear to be the > case (But it might try multiple time per item logged as failure). > > Here's the day by day of your upload job (for User:Neuchâtel Herbarium) > > MariaDB [commonswiki_p]> select substr( log_timestamp, 1, 8 ), > log_action, count(*) from logging_logindex where log_type = > 'gwtoolset' and log_timestamp > '20150500000000' and log_user = > 2103899 and log_action != 'metadata-job' group by 1, 2; > +-------------------------------+-------------------------+----------+ > | substr( log_timestamp, 1, 8 ) | log_action | count(*) | > +-------------------------------+-------------------------+----------+ > | 20150526 | mediafile-job-failed | 378 | > | 20150526 | mediafile-job-succeeded | 926 | > | 20150527 | mediafile-job-failed | 115 | > | 20150527 | mediafile-job-succeeded | 3734 | > | 20150528 | mediafile-job-failed | 6431 | > | 20150528 | mediafile-job-succeeded | 6327 | > | 20150529 | mediafile-job-failed | 12148 | > | 20150530 | mediafile-job-failed | 11915 | > | 20150531 | mediafile-job-failed | 12371 | > | 20150531 | mediafile-job-succeeded | 6 | > | 20150601 | mediafile-job-failed | 7636 | > | 20150601 | mediafile-job-succeeded | 225 | > +-------------------------------+-------------------------+----------+ > 12 rows in set (0.56 sec) > > On May 28th, about 50% of the files failed, however the number of > files attempted to be fetched was roughly the same as on may 29 when > every single file failed. > > I think this suggests that gwtoolset should have some sort of back-off > feature when things start to fail (particularly due to "HTTP request > timed out.") to slow down the request rate. > > --bawolff > > _______________________________________________ > Glamtools mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/glamtools
_______________________________________________ Glamtools mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/glamtools
