Up to 16 jobs among all GWT job types can be picked at once (1 runner per
16 servers).


On Tue, Apr 29, 2014 at 8:41 AM, Gilles Dubuc <[email protected]> wrote:

> what do you mean by unit? each config key in that section shows a default
>> value to the right of it.
>>
>
> I want to figure out how many background job runs we end up with per
> minute/per hour in practice. So I meant units such as X/minute, Y/hour. I
> know that it's dependent on how the background jobs are configured, but
> this throttle figures section of the documentation doesn't help figure that
> out. Makes it hard for anyone to pick a figure, because it's hard to know
> what the number represents.
>
> hopefully, we could instead make it clear to the uploader that if their
>> file sizes exceed Xmb then they should set that throttle to 1 and make sure
>> the engineers and ops are notified in advance about the upload.
>>
>
> Guidelines sound like a good idea. If I'm following this logic correctly,
> though, doesn't that mean that there's also a risk that separate users
> might "step on each other's toes" in terms of resources, if they happen to
> be uploading content at the same time? Basically, if a given user sets a
> threshold which is a fine value for isolated use, isn't the risk that the
> threshold ends up being too high if more than one GWToolset user is
> uploading to Commons at the same time? At first I thought that the limit
> was on the Commons server side, but your remark seems to suggest that this
> is configured on the uploader's side.
>
>
>> job run frequency
>> -----------------
>> how often are the background jobs run?
>> is there a limit on how many GWToolset Mediafile background jobs are
>> picked up at once?
>>
>> i don’t know. aaron schultz would be the best person to ask. on the beta
>> cluster it seemed to vary between 7-30 minutes, but that may have been
>> because of testing or other activity on that server.
>
>
> CCing Aaron.
>
>
> On Tue, Apr 29, 2014 at 4:19 PM, dan entous <[email protected]> wrote:
>
>> On Apr 29, 2014, at 15:10 , Gilles Dubuc <[email protected]> wrote:
>>
>> > Hi Dan,
>> >
>> > wouldn’t it be better to throttle the application/tool that generates
>> thumbnails so that it doesn’t try to produce too many thumbnails at once?
>> >
>> > The issue is that there is no application generating thumbnails at a
>> given rate. Thumbnails are being generated on demand when people view a
>> thumbnail that doesn't exist. And since Special:NewFiles exists, and is
>> visited every few seconds by bots, that means all new uploads have their
>> thumbnails generated almost on the spot. Thus, we can't slow down that
>> part. We have several long-term tasks to improve this issue, but they will
>> take months to implement. Our only option at the moment is to try and avoid
>> having GWToolset make too many massive images appear on Common's
>> Special:NewFiles in a short period of time.
>> >
>> > Over 500 of the tiff images were greater than 50 megapixels and as a
>> consequence Commons fails to render any thumbnails
>> >
>> > Indeed, it seems like some thumbnail generation requests timed out due
>> to the size of these images. There are limits on the image scalers in
>> regards to how long a thumbnailing job can take and these were going over
>> the limit. To make matters worse, the current retry mechanism means that
>> they were being retried 5 times, and thus using 5 times the resources. I
>> would advise against trying to upload those enormous images for now, we
>> should try to focus on a solution for the smaller images. It would be great
>> if the next upload attempt leaves the images that are too large aside.
>> >
>> > I think the safest option to proceed forward is to lower the
>> appropriate GWToolset throttles in production and then schedule a time for
>> Fae to try the upload process again. By scheduling a specific day and time
>> for the next attempt, we can make sure that engineers and ops have eyes on
>> the servers to watch the load. Then if things go well, we can tweak the
>> throttles back to higher values.
>> >
>> >
>> http://www.mediawiki.org/wiki/Extension:GWToolset/Technical_Design#Throttles.2C_Limits.2C_Delays
>> ,
>> >
>> > The throttle documentation doesn't have any unit. I understand that
>> it's "per background job run", but how often do these background jobs run?
>>
>> what do you mean by unit? each config key in that section shows a default
>> value to the right of it.
>>
>>
>> > I couldn't find configuration values for these throttles on Commons.
>> Dan, can you confirm that Commons is using the default values?
>>
>>
>> throttle config values
>> ----------------------
>> the throttle configuration values are in the extension itself, conhttp://
>> git.wikimedia.org/blob/mediawiki%2Fextensions%2FGWToolset.git/d27991ca8168e47152605d73e41b2960333b470a/includes%2FConfig.php,
>> and can be overridden in the
>> http://git.wikimedia.org/tree/operations%2Fmediawiki-config.gitwmf-config/CommonSettings.php
>>  file in the if ( $wmgUseGWToolset ) { section.
>>
>> the config values to most likely change would be
>> $mediafile_job_throttle_default, which is currently set to 10 and
>> $mediafile_job_throttle_max, which is currently set to 20.
>>
>> at the moment, a user can set this throttle between 1-20. that means that
>> every time a GWToolset Metadata background job is run between 1-20
>> GWToolset Mediafile jobs are added to the queue. we could change those
>> values, but that would be a pity for people uploading smaller file sizes.
>> hopefully, we could instead make it clear to the uploader that if their
>> file sizes exceed Xmb then they should set that throttle to 1 and make sure
>> the engineers and ops are notified in advance about the upload.
>>
>> GWToolset\Config::$mediafile_job_throttle_default = new_value
>> GWToolset\Config::$mediafile_job_throttle_max = new_value
>>
>>
>> job run frequency
>> -----------------
>> how often are the background jobs run?
>> is there a limit on how many GWToolset Mediafile background jobs are
>> picked up at once?
>>
>> i don’t know. aaron schultz would be the best person to ask. on the beta
>> cluster it seemed to vary between 7-30 minutes, but that may have been
>> because of testing or other activity on that server.
>>
>>
>> >
>> >
>> > On Mon, Apr 28, 2014 at 11:17 AM, dan entous <[email protected]>
>> wrote:
>> > GWToolset already has several throttles in place,
>> http://www.mediawiki.org/wiki/Extension:GWToolset/Technical_Design#Throttles.2C_Limits.2C_Delays,
>> that limit how many background uploads are picked up with each background
>> job run, and how many total GWToolset background jobs can exist in the
>> entire job queue. on the beta cluster the background job seemed to vary in
>> regards to how often it ran for GWToolset varying between 7-30. that seems
>> like enough time for additional images to get processed in-between
>> GWToolset images.
>> >
>> > wouldn’t it be better to throttle the application/tool that generates
>> thumbnails so that it doesn’t try to produce too many thumbnails at once?
>> >
>> > with kind regards,
>> > dan
>> >
>> >
>> >
>> > On Apr 25, 2014, at 20:41 , Gergo Tisza <[email protected]> wrote:
>> >
>> > > On Fri, Apr 25, 2014 at 11:13 AM, Fæ <[email protected]> wrote:
>> > > With no obvious immediate fix/work-around on the table from WMF ops, I
>> > > have proposed to re-start my uploads for this project with an
>> > > effective throttle by using 2 threads (this is a setting on the first
>> > > screen of the GWToolset. In practice, having tried a run of a couple
>> > > of hundred, this means that the tool is uploading 100MB sized images
>> > > at a rate of 2 every 5 minutes. This seems to not be causing any
>> > > issues.
>> > >
>> > > The issue was not directly with the uploads; there is no thumbnail
>> rendering happening on upload, so GWToolset adding lots of large TIFFs
>> quickly would not cause problems in itself. The upload speed was
>> problematic because that meant GWToolset saturated pages like
>> Special:NewFiles, and when somebody looked at such pages, *that* triggered
>> lots of thumbnail renderings of huge TIFF files at the same time. If
>> GWToolset is slowed down and lots of miscellaneous files are uploaded
>> between the TIFFs, those special pages won't be problematic, but something
>> like a gallery or category of huge TIFF files could still be.
>> > > _______________________________________________
>> > > Glamtools mailing list
>> > > [email protected]
>> > > https://lists.wikimedia.org/mailman/listinfo/glamtools
>> >
>> >
>>
>>
>


-- 
-Aaron S
_______________________________________________
Glamtools mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/glamtools

Reply via email to