On Apr 29, 2014, at 15:10 , Gilles Dubuc <[email protected]> wrote:

> Hi Dan,
> 
> wouldn’t it be better to throttle the application/tool that generates 
> thumbnails so that it doesn’t try to produce too many thumbnails at once?
> 
> The issue is that there is no application generating thumbnails at a given 
> rate. Thumbnails are being generated on demand when people view a thumbnail 
> that doesn't exist. And since Special:NewFiles exists, and is visited every 
> few seconds by bots, that means all new uploads have their thumbnails 
> generated almost on the spot. Thus, we can't slow down that part. We have 
> several long-term tasks to improve this issue, but they will take months to 
> implement. Our only option at the moment is to try and avoid having GWToolset 
> make too many massive images appear on Common's Special:NewFiles in a short 
> period of time.
> 
> Over 500 of the tiff images were greater than 50 megapixels and as a 
> consequence Commons fails to render any thumbnails
> 
> Indeed, it seems like some thumbnail generation requests timed out due to the 
> size of these images. There are limits on the image scalers in regards to how 
> long a thumbnailing job can take and these were going over the limit. To make 
> matters worse, the current retry mechanism means that they were being retried 
> 5 times, and thus using 5 times the resources. I would advise against trying 
> to upload those enormous images for now, we should try to focus on a solution 
> for the smaller images. It would be great if the next upload attempt leaves 
> the images that are too large aside.
> 
> I think the safest option to proceed forward is to lower the appropriate 
> GWToolset throttles in production and then schedule a time for Fae to try the 
> upload process again. By scheduling a specific day and time for the next 
> attempt, we can make sure that engineers and ops have eyes on the servers to 
> watch the load. Then if things go well, we can tweak the throttles back to 
> higher values.
> 
> http://www.mediawiki.org/wiki/Extension:GWToolset/Technical_Design#Throttles.2C_Limits.2C_Delays,
> 
> The throttle documentation doesn't have any unit. I understand that it's "per 
> background job run", but how often do these background jobs run?

what do you mean by unit? each config key in that section shows a default value 
to the right of it.


> I couldn't find configuration values for these throttles on Commons. Dan, can 
> you confirm that Commons is using the default values?


throttle config values
----------------------
the throttle configuration values are in the extension itself, 
conhttp://git.wikimedia.org/blob/mediawiki%2Fextensions%2FGWToolset.git/d27991ca8168e47152605d73e41b2960333b470a/includes%2FConfig.php,
 and can be overridden in the 
http://git.wikimedia.org/tree/operations%2Fmediawiki-config.git 
wmf-config/CommonSettings.php file in the if ( $wmgUseGWToolset ) { section.

the config values to most likely change would be 
$mediafile_job_throttle_default, which is currently set to 10 and 
$mediafile_job_throttle_max, which is currently set to 20.

at the moment, a user can set this throttle between 1-20. that means that every 
time a GWToolset Metadata background job is run between 1-20 GWToolset 
Mediafile jobs are added to the queue. we could change those values, but that 
would be a pity for people uploading smaller file sizes. hopefully, we could 
instead make it clear to the uploader that if their file sizes exceed Xmb then 
they should set that throttle to 1 and make sure the engineers and ops are 
notified in advance about the upload.

GWToolset\Config::$mediafile_job_throttle_default = new_value
GWToolset\Config::$mediafile_job_throttle_max = new_value


job run frequency
-----------------
how often are the background jobs run?
is there a limit on how many GWToolset Mediafile background jobs are picked up 
at once?

i don’t know. aaron schultz would be the best person to ask. on the beta 
cluster it seemed to vary between 7-30 minutes, but that may have been because 
of testing or other activity on that server.


> 
> 
> On Mon, Apr 28, 2014 at 11:17 AM, dan entous <[email protected]> wrote:
> GWToolset already has several throttles in place, 
> http://www.mediawiki.org/wiki/Extension:GWToolset/Technical_Design#Throttles.2C_Limits.2C_Delays,
>  that limit how many background uploads are picked up with each background 
> job run, and how many total GWToolset background jobs can exist in the entire 
> job queue. on the beta cluster the background job seemed to vary in regards 
> to how often it ran for GWToolset varying between 7-30. that seems like 
> enough time for additional images to get processed in-between GWToolset 
> images.
> 
> wouldn’t it be better to throttle the application/tool that generates 
> thumbnails so that it doesn’t try to produce too many thumbnails at once?
> 
> with kind regards,
> dan
> 
> 
> 
> On Apr 25, 2014, at 20:41 , Gergo Tisza <[email protected]> wrote:
> 
> > On Fri, Apr 25, 2014 at 11:13 AM, Fæ <[email protected]> wrote:
> > With no obvious immediate fix/work-around on the table from WMF ops, I
> > have proposed to re-start my uploads for this project with an
> > effective throttle by using 2 threads (this is a setting on the first
> > screen of the GWToolset. In practice, having tried a run of a couple
> > of hundred, this means that the tool is uploading 100MB sized images
> > at a rate of 2 every 5 minutes. This seems to not be causing any
> > issues.
> >
> > The issue was not directly with the uploads; there is no thumbnail 
> > rendering happening on upload, so GWToolset adding lots of large TIFFs 
> > quickly would not cause problems in itself. The upload speed was 
> > problematic because that meant GWToolset saturated pages like 
> > Special:NewFiles, and when somebody looked at such pages, *that* triggered 
> > lots of thumbnail renderings of huge TIFF files at the same time. If 
> > GWToolset is slowed down and lots of miscellaneous files are uploaded 
> > between the TIFFs, those special pages won't be problematic, but something 
> > like a gallery or category of huge TIFF files could still be.
> > _______________________________________________
> > Glamtools mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/glamtools
> 
> 


_______________________________________________
Glamtools mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/glamtools

Reply via email to