Up to 16 jobs among all GWT job types can be picked at once (1 runner per 16 servers).
On Tue, Apr 29, 2014 at 8:41 AM, Gilles Dubuc <[email protected]> wrote: > what do you mean by unit? each config key in that section shows a default >> value to the right of it. >> > > I want to figure out how many background job runs we end up with per > minute/per hour in practice. So I meant units such as X/minute, Y/hour. I > know that it's dependent on how the background jobs are configured, but > this throttle figures section of the documentation doesn't help figure that > out. Makes it hard for anyone to pick a figure, because it's hard to know > what the number represents. > > hopefully, we could instead make it clear to the uploader that if their >> file sizes exceed Xmb then they should set that throttle to 1 and make sure >> the engineers and ops are notified in advance about the upload. >> > > Guidelines sound like a good idea. If I'm following this logic correctly, > though, doesn't that mean that there's also a risk that separate users > might "step on each other's toes" in terms of resources, if they happen to > be uploading content at the same time? Basically, if a given user sets a > threshold which is a fine value for isolated use, isn't the risk that the > threshold ends up being too high if more than one GWToolset user is > uploading to Commons at the same time? At first I thought that the limit > was on the Commons server side, but your remark seems to suggest that this > is configured on the uploader's side. > > >> job run frequency >> ----------------- >> how often are the background jobs run? >> is there a limit on how many GWToolset Mediafile background jobs are >> picked up at once? >> >> i don’t know. aaron schultz would be the best person to ask. on the beta >> cluster it seemed to vary between 7-30 minutes, but that may have been >> because of testing or other activity on that server. > > > CCing Aaron. > > > On Tue, Apr 29, 2014 at 4:19 PM, dan entous <[email protected]> wrote: > >> On Apr 29, 2014, at 15:10 , Gilles Dubuc <[email protected]> wrote: >> >> > Hi Dan, >> > >> > wouldn’t it be better to throttle the application/tool that generates >> thumbnails so that it doesn’t try to produce too many thumbnails at once? >> > >> > The issue is that there is no application generating thumbnails at a >> given rate. Thumbnails are being generated on demand when people view a >> thumbnail that doesn't exist. And since Special:NewFiles exists, and is >> visited every few seconds by bots, that means all new uploads have their >> thumbnails generated almost on the spot. Thus, we can't slow down that >> part. We have several long-term tasks to improve this issue, but they will >> take months to implement. Our only option at the moment is to try and avoid >> having GWToolset make too many massive images appear on Common's >> Special:NewFiles in a short period of time. >> > >> > Over 500 of the tiff images were greater than 50 megapixels and as a >> consequence Commons fails to render any thumbnails >> > >> > Indeed, it seems like some thumbnail generation requests timed out due >> to the size of these images. There are limits on the image scalers in >> regards to how long a thumbnailing job can take and these were going over >> the limit. To make matters worse, the current retry mechanism means that >> they were being retried 5 times, and thus using 5 times the resources. I >> would advise against trying to upload those enormous images for now, we >> should try to focus on a solution for the smaller images. It would be great >> if the next upload attempt leaves the images that are too large aside. >> > >> > I think the safest option to proceed forward is to lower the >> appropriate GWToolset throttles in production and then schedule a time for >> Fae to try the upload process again. By scheduling a specific day and time >> for the next attempt, we can make sure that engineers and ops have eyes on >> the servers to watch the load. Then if things go well, we can tweak the >> throttles back to higher values. >> > >> > >> http://www.mediawiki.org/wiki/Extension:GWToolset/Technical_Design#Throttles.2C_Limits.2C_Delays >> , >> > >> > The throttle documentation doesn't have any unit. I understand that >> it's "per background job run", but how often do these background jobs run? >> >> what do you mean by unit? each config key in that section shows a default >> value to the right of it. >> >> >> > I couldn't find configuration values for these throttles on Commons. >> Dan, can you confirm that Commons is using the default values? >> >> >> throttle config values >> ---------------------- >> the throttle configuration values are in the extension itself, conhttp:// >> git.wikimedia.org/blob/mediawiki%2Fextensions%2FGWToolset.git/d27991ca8168e47152605d73e41b2960333b470a/includes%2FConfig.php, >> and can be overridden in the >> http://git.wikimedia.org/tree/operations%2Fmediawiki-config.gitwmf-config/CommonSettings.php >> file in the if ( $wmgUseGWToolset ) { section. >> >> the config values to most likely change would be >> $mediafile_job_throttle_default, which is currently set to 10 and >> $mediafile_job_throttle_max, which is currently set to 20. >> >> at the moment, a user can set this throttle between 1-20. that means that >> every time a GWToolset Metadata background job is run between 1-20 >> GWToolset Mediafile jobs are added to the queue. we could change those >> values, but that would be a pity for people uploading smaller file sizes. >> hopefully, we could instead make it clear to the uploader that if their >> file sizes exceed Xmb then they should set that throttle to 1 and make sure >> the engineers and ops are notified in advance about the upload. >> >> GWToolset\Config::$mediafile_job_throttle_default = new_value >> GWToolset\Config::$mediafile_job_throttle_max = new_value >> >> >> job run frequency >> ----------------- >> how often are the background jobs run? >> is there a limit on how many GWToolset Mediafile background jobs are >> picked up at once? >> >> i don’t know. aaron schultz would be the best person to ask. on the beta >> cluster it seemed to vary between 7-30 minutes, but that may have been >> because of testing or other activity on that server. >> >> >> > >> > >> > On Mon, Apr 28, 2014 at 11:17 AM, dan entous <[email protected]> >> wrote: >> > GWToolset already has several throttles in place, >> http://www.mediawiki.org/wiki/Extension:GWToolset/Technical_Design#Throttles.2C_Limits.2C_Delays, >> that limit how many background uploads are picked up with each background >> job run, and how many total GWToolset background jobs can exist in the >> entire job queue. on the beta cluster the background job seemed to vary in >> regards to how often it ran for GWToolset varying between 7-30. that seems >> like enough time for additional images to get processed in-between >> GWToolset images. >> > >> > wouldn’t it be better to throttle the application/tool that generates >> thumbnails so that it doesn’t try to produce too many thumbnails at once? >> > >> > with kind regards, >> > dan >> > >> > >> > >> > On Apr 25, 2014, at 20:41 , Gergo Tisza <[email protected]> wrote: >> > >> > > On Fri, Apr 25, 2014 at 11:13 AM, Fæ <[email protected]> wrote: >> > > With no obvious immediate fix/work-around on the table from WMF ops, I >> > > have proposed to re-start my uploads for this project with an >> > > effective throttle by using 2 threads (this is a setting on the first >> > > screen of the GWToolset. In practice, having tried a run of a couple >> > > of hundred, this means that the tool is uploading 100MB sized images >> > > at a rate of 2 every 5 minutes. This seems to not be causing any >> > > issues. >> > > >> > > The issue was not directly with the uploads; there is no thumbnail >> rendering happening on upload, so GWToolset adding lots of large TIFFs >> quickly would not cause problems in itself. The upload speed was >> problematic because that meant GWToolset saturated pages like >> Special:NewFiles, and when somebody looked at such pages, *that* triggered >> lots of thumbnail renderings of huge TIFF files at the same time. If >> GWToolset is slowed down and lots of miscellaneous files are uploaded >> between the TIFFs, those special pages won't be problematic, but something >> like a gallery or category of huge TIFF files could still be. >> > > _______________________________________________ >> > > Glamtools mailing list >> > > [email protected] >> > > https://lists.wikimedia.org/mailman/listinfo/glamtools >> > >> > >> >> > -- -Aaron S
_______________________________________________ Glamtools mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/glamtools
