Thanks for the update. I think the only GWT project that has caused
the loading issue has been the NYPL maps upload, due to both the large
file sizes and the tiff format used. I do have a project in the wings
that has similar tiff sizes (100MB+)... some collections from the
Library of Congress that I paused last month while I moved over to
using the GWT. I have no plan to restart the LoC uploads in the next
couple of weeks. The issue does not seem to be triggers by upload rate
per se, as the NYPL maps were being uploaded at an average of around 1
file per minute.

I'll wait for a message from yourselves on this list before attempting
these, or I'll get in touch directly with yourself. I believe there
are actually only around 300 images left to upload from the NYPL
collection, there might be more later in the year if I can diagnose
why around 40% do not seem to be available via their API/catalogue.

In the coming week I'm planning a Rijksmuseum upload, however these
are much smaller jpg files so I do not believe there is any need to
delay that project.[2]

Links
1. https://commons.wikimedia.org/wiki/Commons:Batch_uploading/NYPL_Maps
2. 
https://commons.wikimedia.org/wiki/Commons:Batch_uploading/Art_of_Japan_in_the_Rijksmuseum

Fae

On 12 May 2014 10:58, Gilles Dubuc <[email protected]> wrote:
> From the Ops & Multimedia mailing lists:
>
>> We just had a brief imagescaler outage today at approx. 11:20 UTC that
>> was investigated and NYPL maps were found to be the cause of the outage.
>
>
> As Gergo's unanswered recent message in this thread suggested, we're
> actively working on a number of changes to stabilize GWToolset and improve
> image scaler performance in order to avoid such outages. I assumed that
> since everyone involved is participating in this thread, that you were
> waiting for these changes to happen before restarting the GWToolset job that
> caused the previous outage a couple of weeks ago, or that you would warn us
> when that job would be run again. There seems to be a communication issue
> here. By running this job, you've taken down thumbnail generation on Commons
> (and all WMF wikis) and we were lucky that someone from Ops was around,
> noticed it and reacted quickly. This could have been easily avoided with
> better coordination, by at least scheduling a time to run your next attempt,
> with people from Ops watching servers at the time the job is run. Please
> make sure that this happens for the next batch of NYPL maps/massive files
> that you plan to upload with GWToolset. All it takes is scheduling a day and
> time for the next upload attempt.
>
> Gergo and I will keep replying to this thread to notify everyone when our
> related code changes are merged.
>
>
> On Wed, May 7, 2014 at 10:26 PM, Gergo Tisza <[email protected]> wrote:
>>
>> Uhh... let's give this another shot in the morning.
>>
>> I went through last day's upload logs; on average there are ~600 uploads
>> an hour, the peak was 1900, the negative peak around 240. (The numbers are
>> at http://pastebin.com/raw.php?i=wmBRJm1G in case anybody finds them
>> useful.) So that's around 4 files per minute in worst case.
>>
>> If we are aiming for no more than 10% of Special:NewFiles to be taken up
>> by GWToolset, that means 5 uploads per run of the control job (10% of the 50
>> slots at Special:NewFiles) - the upload jobs can't really be throttled, so
>> we must make sure they come in small enough chunks, no matter how much delay
>> there is between the chunks). Also, we want to keep below 10% of the total
>> Commons upload rate - that means 24 images per hour, which is roughly five
>> runs of the control job per hour.
>>
>> So the correct config is
>>
>> GWToolset\Config::$mediafile_job_throttle_default = 5;
>> $wgJobBackoffThrottling['gwtoolsetUploadMetadataJob'] = 5 / 3600;
>>
>> I'm leaving the max throttle at 20 so that people who are uploading small,
>> non-TIFF images can get a somewhat higher speed.
>
>
>
> _______________________________________________
> Glamtools mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/glamtools
>

-- 
[email protected] https://commons.wikimedia.org/wiki/User:Fae

_______________________________________________
Glamtools mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/glamtools

Reply via email to