If they are not "delayed" then yes, it won't do much to limit the files/per
job. Setting $wgJobBackoffThrottling might be useful here.


On Tue, Apr 29, 2014 at 9:05 AM, Gilles Dubuc <[email protected]> wrote:

> Up to 16 jobs among all GWT job types can be picked at once (1 runner per
>> 16 servers).
>>
>
> Are they picked up at a steady frequency?
>
> Basically, if each job treats less images in one run, will it truly space
> out each image's treatment in time? Or will it have no effect because each
> GWT job will complete faster, with the next GWT job being picked up right
> after the previous one on a given server completes?
>
>
> On Tue, Apr 29, 2014 at 5:56 PM, Aaron Schulz <[email protected]>wrote:
>
>> Up to 16 jobs among all GWT job types can be picked at once (1 runner per
>> 16 servers).
>>
>>
>> On Tue, Apr 29, 2014 at 8:41 AM, Gilles Dubuc <[email protected]>wrote:
>>
>>> what do you mean by unit? each config key in that section shows a
>>>> default value to the right of it.
>>>>
>>>
>>> I want to figure out how many background job runs we end up with per
>>> minute/per hour in practice. So I meant units such as X/minute, Y/hour. I
>>> know that it's dependent on how the background jobs are configured, but
>>> this throttle figures section of the documentation doesn't help figure that
>>> out. Makes it hard for anyone to pick a figure, because it's hard to know
>>> what the number represents.
>>>
>>> hopefully, we could instead make it clear to the uploader that if their
>>>> file sizes exceed Xmb then they should set that throttle to 1 and make sure
>>>> the engineers and ops are notified in advance about the upload.
>>>>
>>>
>>> Guidelines sound like a good idea. If I'm following this logic
>>> correctly, though, doesn't that mean that there's also a risk that separate
>>> users might "step on each other's toes" in terms of resources, if they
>>> happen to be uploading content at the same time? Basically, if a given user
>>> sets a threshold which is a fine value for isolated use, isn't the risk
>>> that the threshold ends up being too high if more than one GWToolset user
>>> is uploading to Commons at the same time? At first I thought that the limit
>>> was on the Commons server side, but your remark seems to suggest that this
>>> is configured on the uploader's side.
>>>
>>>
>>>> job run frequency
>>>> -----------------
>>>> how often are the background jobs run?
>>>> is there a limit on how many GWToolset Mediafile background jobs are
>>>> picked up at once?
>>>>
>>>> i don’t know. aaron schultz would be the best person to ask. on the
>>>> beta cluster it seemed to vary between 7-30 minutes, but that may have been
>>>> because of testing or other activity on that server.
>>>
>>>
>>> CCing Aaron.
>>>
>>>
>>> On Tue, Apr 29, 2014 at 4:19 PM, dan entous <[email protected]> wrote:
>>>
>>>> On Apr 29, 2014, at 15:10 , Gilles Dubuc <[email protected]> wrote:
>>>>
>>>> > Hi Dan,
>>>> >
>>>> > wouldn’t it be better to throttle the application/tool that generates
>>>> thumbnails so that it doesn’t try to produce too many thumbnails at once?
>>>> >
>>>> > The issue is that there is no application generating thumbnails at a
>>>> given rate. Thumbnails are being generated on demand when people view a
>>>> thumbnail that doesn't exist. And since Special:NewFiles exists, and is
>>>> visited every few seconds by bots, that means all new uploads have their
>>>> thumbnails generated almost on the spot. Thus, we can't slow down that
>>>> part. We have several long-term tasks to improve this issue, but they will
>>>> take months to implement. Our only option at the moment is to try and avoid
>>>> having GWToolset make too many massive images appear on Common's
>>>> Special:NewFiles in a short period of time.
>>>> >
>>>> > Over 500 of the tiff images were greater than 50 megapixels and as a
>>>> consequence Commons fails to render any thumbnails
>>>> >
>>>> > Indeed, it seems like some thumbnail generation requests timed out
>>>> due to the size of these images. There are limits on the image scalers in
>>>> regards to how long a thumbnailing job can take and these were going over
>>>> the limit. To make matters worse, the current retry mechanism means that
>>>> they were being retried 5 times, and thus using 5 times the resources. I
>>>> would advise against trying to upload those enormous images for now, we
>>>> should try to focus on a solution for the smaller images. It would be great
>>>> if the next upload attempt leaves the images that are too large aside.
>>>> >
>>>> > I think the safest option to proceed forward is to lower the
>>>> appropriate GWToolset throttles in production and then schedule a time for
>>>> Fae to try the upload process again. By scheduling a specific day and time
>>>> for the next attempt, we can make sure that engineers and ops have eyes on
>>>> the servers to watch the load. Then if things go well, we can tweak the
>>>> throttles back to higher values.
>>>> >
>>>> >
>>>> http://www.mediawiki.org/wiki/Extension:GWToolset/Technical_Design#Throttles.2C_Limits.2C_Delays
>>>> ,
>>>> >
>>>> > The throttle documentation doesn't have any unit. I understand that
>>>> it's "per background job run", but how often do these background jobs run?
>>>>
>>>> what do you mean by unit? each config key in that section shows a
>>>> default value to the right of it.
>>>>
>>>>
>>>> > I couldn't find configuration values for these throttles on Commons.
>>>> Dan, can you confirm that Commons is using the default values?
>>>>
>>>>
>>>> throttle config values
>>>> ----------------------
>>>> the throttle configuration values are in the extension itself,
>>>> conhttp://
>>>> git.wikimedia.org/blob/mediawiki%2Fextensions%2FGWToolset.git/d27991ca8168e47152605d73e41b2960333b470a/includes%2FConfig.php,
>>>> and can be overridden in the
>>>> http://git.wikimedia.org/tree/operations%2Fmediawiki-config.gitwmf-config/CommonSettings.php
>>>>  file in the if ( $wmgUseGWToolset ) { section.
>>>>
>>>> the config values to most likely change would be
>>>> $mediafile_job_throttle_default, which is currently set to 10 and
>>>> $mediafile_job_throttle_max, which is currently set to 20.
>>>>
>>>> at the moment, a user can set this throttle between 1-20. that means
>>>> that every time a GWToolset Metadata background job is run between 1-20
>>>> GWToolset Mediafile jobs are added to the queue. we could change those
>>>> values, but that would be a pity for people uploading smaller file sizes.
>>>> hopefully, we could instead make it clear to the uploader that if their
>>>> file sizes exceed Xmb then they should set that throttle to 1 and make sure
>>>> the engineers and ops are notified in advance about the upload.
>>>>
>>>> GWToolset\Config::$mediafile_job_throttle_default = new_value
>>>> GWToolset\Config::$mediafile_job_throttle_max = new_value
>>>>
>>>>
>>>> job run frequency
>>>> -----------------
>>>> how often are the background jobs run?
>>>> is there a limit on how many GWToolset Mediafile background jobs are
>>>> picked up at once?
>>>>
>>>> i don’t know. aaron schultz would be the best person to ask. on the
>>>> beta cluster it seemed to vary between 7-30 minutes, but that may have been
>>>> because of testing or other activity on that server.
>>>>
>>>>
>>>> >
>>>> >
>>>> > On Mon, Apr 28, 2014 at 11:17 AM, dan entous <[email protected]>
>>>> wrote:
>>>> > GWToolset already has several throttles in place,
>>>> http://www.mediawiki.org/wiki/Extension:GWToolset/Technical_Design#Throttles.2C_Limits.2C_Delays,
>>>> that limit how many background uploads are picked up with each background
>>>> job run, and how many total GWToolset background jobs can exist in the
>>>> entire job queue. on the beta cluster the background job seemed to vary in
>>>> regards to how often it ran for GWToolset varying between 7-30. that seems
>>>> like enough time for additional images to get processed in-between
>>>> GWToolset images.
>>>> >
>>>> > wouldn’t it be better to throttle the application/tool that generates
>>>> thumbnails so that it doesn’t try to produce too many thumbnails at once?
>>>> >
>>>> > with kind regards,
>>>> > dan
>>>> >
>>>> >
>>>> >
>>>> > On Apr 25, 2014, at 20:41 , Gergo Tisza <[email protected]> wrote:
>>>> >
>>>> > > On Fri, Apr 25, 2014 at 11:13 AM, Fæ <[email protected]> wrote:
>>>> > > With no obvious immediate fix/work-around on the table from WMF
>>>> ops, I
>>>> > > have proposed to re-start my uploads for this project with an
>>>> > > effective throttle by using 2 threads (this is a setting on the
>>>> first
>>>> > > screen of the GWToolset. In practice, having tried a run of a couple
>>>> > > of hundred, this means that the tool is uploading 100MB sized images
>>>> > > at a rate of 2 every 5 minutes. This seems to not be causing any
>>>> > > issues.
>>>> > >
>>>> > > The issue was not directly with the uploads; there is no thumbnail
>>>> rendering happening on upload, so GWToolset adding lots of large TIFFs
>>>> quickly would not cause problems in itself. The upload speed was
>>>> problematic because that meant GWToolset saturated pages like
>>>> Special:NewFiles, and when somebody looked at such pages, *that* triggered
>>>> lots of thumbnail renderings of huge TIFF files at the same time. If
>>>> GWToolset is slowed down and lots of miscellaneous files are uploaded
>>>> between the TIFFs, those special pages won't be problematic, but something
>>>> like a gallery or category of huge TIFF files could still be.
>>>> > > _______________________________________________
>>>> > > Glamtools mailing list
>>>> > > [email protected]
>>>> > > https://lists.wikimedia.org/mailman/listinfo/glamtools
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>>
>> --
>> -Aaron S
>>
>
>


-- 
-Aaron S
_______________________________________________
Glamtools mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/glamtools

Reply via email to