If they are not "delayed" then yes, it won't do much to limit the files/per job. Setting $wgJobBackoffThrottling might be useful here.
On Tue, Apr 29, 2014 at 9:05 AM, Gilles Dubuc <[email protected]> wrote: > Up to 16 jobs among all GWT job types can be picked at once (1 runner per >> 16 servers). >> > > Are they picked up at a steady frequency? > > Basically, if each job treats less images in one run, will it truly space > out each image's treatment in time? Or will it have no effect because each > GWT job will complete faster, with the next GWT job being picked up right > after the previous one on a given server completes? > > > On Tue, Apr 29, 2014 at 5:56 PM, Aaron Schulz <[email protected]>wrote: > >> Up to 16 jobs among all GWT job types can be picked at once (1 runner per >> 16 servers). >> >> >> On Tue, Apr 29, 2014 at 8:41 AM, Gilles Dubuc <[email protected]>wrote: >> >>> what do you mean by unit? each config key in that section shows a >>>> default value to the right of it. >>>> >>> >>> I want to figure out how many background job runs we end up with per >>> minute/per hour in practice. So I meant units such as X/minute, Y/hour. I >>> know that it's dependent on how the background jobs are configured, but >>> this throttle figures section of the documentation doesn't help figure that >>> out. Makes it hard for anyone to pick a figure, because it's hard to know >>> what the number represents. >>> >>> hopefully, we could instead make it clear to the uploader that if their >>>> file sizes exceed Xmb then they should set that throttle to 1 and make sure >>>> the engineers and ops are notified in advance about the upload. >>>> >>> >>> Guidelines sound like a good idea. If I'm following this logic >>> correctly, though, doesn't that mean that there's also a risk that separate >>> users might "step on each other's toes" in terms of resources, if they >>> happen to be uploading content at the same time? Basically, if a given user >>> sets a threshold which is a fine value for isolated use, isn't the risk >>> that the threshold ends up being too high if more than one GWToolset user >>> is uploading to Commons at the same time? At first I thought that the limit >>> was on the Commons server side, but your remark seems to suggest that this >>> is configured on the uploader's side. >>> >>> >>>> job run frequency >>>> ----------------- >>>> how often are the background jobs run? >>>> is there a limit on how many GWToolset Mediafile background jobs are >>>> picked up at once? >>>> >>>> i don’t know. aaron schultz would be the best person to ask. on the >>>> beta cluster it seemed to vary between 7-30 minutes, but that may have been >>>> because of testing or other activity on that server. >>> >>> >>> CCing Aaron. >>> >>> >>> On Tue, Apr 29, 2014 at 4:19 PM, dan entous <[email protected]> wrote: >>> >>>> On Apr 29, 2014, at 15:10 , Gilles Dubuc <[email protected]> wrote: >>>> >>>> > Hi Dan, >>>> > >>>> > wouldn’t it be better to throttle the application/tool that generates >>>> thumbnails so that it doesn’t try to produce too many thumbnails at once? >>>> > >>>> > The issue is that there is no application generating thumbnails at a >>>> given rate. Thumbnails are being generated on demand when people view a >>>> thumbnail that doesn't exist. And since Special:NewFiles exists, and is >>>> visited every few seconds by bots, that means all new uploads have their >>>> thumbnails generated almost on the spot. Thus, we can't slow down that >>>> part. We have several long-term tasks to improve this issue, but they will >>>> take months to implement. Our only option at the moment is to try and avoid >>>> having GWToolset make too many massive images appear on Common's >>>> Special:NewFiles in a short period of time. >>>> > >>>> > Over 500 of the tiff images were greater than 50 megapixels and as a >>>> consequence Commons fails to render any thumbnails >>>> > >>>> > Indeed, it seems like some thumbnail generation requests timed out >>>> due to the size of these images. There are limits on the image scalers in >>>> regards to how long a thumbnailing job can take and these were going over >>>> the limit. To make matters worse, the current retry mechanism means that >>>> they were being retried 5 times, and thus using 5 times the resources. I >>>> would advise against trying to upload those enormous images for now, we >>>> should try to focus on a solution for the smaller images. It would be great >>>> if the next upload attempt leaves the images that are too large aside. >>>> > >>>> > I think the safest option to proceed forward is to lower the >>>> appropriate GWToolset throttles in production and then schedule a time for >>>> Fae to try the upload process again. By scheduling a specific day and time >>>> for the next attempt, we can make sure that engineers and ops have eyes on >>>> the servers to watch the load. Then if things go well, we can tweak the >>>> throttles back to higher values. >>>> > >>>> > >>>> http://www.mediawiki.org/wiki/Extension:GWToolset/Technical_Design#Throttles.2C_Limits.2C_Delays >>>> , >>>> > >>>> > The throttle documentation doesn't have any unit. I understand that >>>> it's "per background job run", but how often do these background jobs run? >>>> >>>> what do you mean by unit? each config key in that section shows a >>>> default value to the right of it. >>>> >>>> >>>> > I couldn't find configuration values for these throttles on Commons. >>>> Dan, can you confirm that Commons is using the default values? >>>> >>>> >>>> throttle config values >>>> ---------------------- >>>> the throttle configuration values are in the extension itself, >>>> conhttp:// >>>> git.wikimedia.org/blob/mediawiki%2Fextensions%2FGWToolset.git/d27991ca8168e47152605d73e41b2960333b470a/includes%2FConfig.php, >>>> and can be overridden in the >>>> http://git.wikimedia.org/tree/operations%2Fmediawiki-config.gitwmf-config/CommonSettings.php >>>> file in the if ( $wmgUseGWToolset ) { section. >>>> >>>> the config values to most likely change would be >>>> $mediafile_job_throttle_default, which is currently set to 10 and >>>> $mediafile_job_throttle_max, which is currently set to 20. >>>> >>>> at the moment, a user can set this throttle between 1-20. that means >>>> that every time a GWToolset Metadata background job is run between 1-20 >>>> GWToolset Mediafile jobs are added to the queue. we could change those >>>> values, but that would be a pity for people uploading smaller file sizes. >>>> hopefully, we could instead make it clear to the uploader that if their >>>> file sizes exceed Xmb then they should set that throttle to 1 and make sure >>>> the engineers and ops are notified in advance about the upload. >>>> >>>> GWToolset\Config::$mediafile_job_throttle_default = new_value >>>> GWToolset\Config::$mediafile_job_throttle_max = new_value >>>> >>>> >>>> job run frequency >>>> ----------------- >>>> how often are the background jobs run? >>>> is there a limit on how many GWToolset Mediafile background jobs are >>>> picked up at once? >>>> >>>> i don’t know. aaron schultz would be the best person to ask. on the >>>> beta cluster it seemed to vary between 7-30 minutes, but that may have been >>>> because of testing or other activity on that server. >>>> >>>> >>>> > >>>> > >>>> > On Mon, Apr 28, 2014 at 11:17 AM, dan entous <[email protected]> >>>> wrote: >>>> > GWToolset already has several throttles in place, >>>> http://www.mediawiki.org/wiki/Extension:GWToolset/Technical_Design#Throttles.2C_Limits.2C_Delays, >>>> that limit how many background uploads are picked up with each background >>>> job run, and how many total GWToolset background jobs can exist in the >>>> entire job queue. on the beta cluster the background job seemed to vary in >>>> regards to how often it ran for GWToolset varying between 7-30. that seems >>>> like enough time for additional images to get processed in-between >>>> GWToolset images. >>>> > >>>> > wouldn’t it be better to throttle the application/tool that generates >>>> thumbnails so that it doesn’t try to produce too many thumbnails at once? >>>> > >>>> > with kind regards, >>>> > dan >>>> > >>>> > >>>> > >>>> > On Apr 25, 2014, at 20:41 , Gergo Tisza <[email protected]> wrote: >>>> > >>>> > > On Fri, Apr 25, 2014 at 11:13 AM, Fæ <[email protected]> wrote: >>>> > > With no obvious immediate fix/work-around on the table from WMF >>>> ops, I >>>> > > have proposed to re-start my uploads for this project with an >>>> > > effective throttle by using 2 threads (this is a setting on the >>>> first >>>> > > screen of the GWToolset. In practice, having tried a run of a couple >>>> > > of hundred, this means that the tool is uploading 100MB sized images >>>> > > at a rate of 2 every 5 minutes. This seems to not be causing any >>>> > > issues. >>>> > > >>>> > > The issue was not directly with the uploads; there is no thumbnail >>>> rendering happening on upload, so GWToolset adding lots of large TIFFs >>>> quickly would not cause problems in itself. The upload speed was >>>> problematic because that meant GWToolset saturated pages like >>>> Special:NewFiles, and when somebody looked at such pages, *that* triggered >>>> lots of thumbnail renderings of huge TIFF files at the same time. If >>>> GWToolset is slowed down and lots of miscellaneous files are uploaded >>>> between the TIFFs, those special pages won't be problematic, but something >>>> like a gallery or category of huge TIFF files could still be. >>>> > > _______________________________________________ >>>> > > Glamtools mailing list >>>> > > [email protected] >>>> > > https://lists.wikimedia.org/mailman/listinfo/glamtools >>>> > >>>> > >>>> >>>> >>> >> >> >> -- >> -Aaron S >> > > -- -Aaron S
_______________________________________________ Glamtools mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/glamtools
