Re: [Multimedia] [Ops] Brief image scalers outage, Mon Apr 21 03:12 UTC

Giuseppe Lavagetto Mon, 21 Apr 2014 09:20:12 -0700

On Mon, Apr 21, 2014 at 8:04 AM, Ori Livneh <[email protected]> wrote:


> The number of Apache busy workers on the image scalers spiked between 2:55
> and 3:15 UTC, peaking at about 3:12 and overwhelming
> rendering.svc.eqiad.wmnet for about a minute.
>
> The outage correlates fairly well with a spike of fatals in
> TimedMediaHandler, consisting almost entirely of requests to this URL: <
> http://commons.wikimedia.org/w/thumb_handler.php/2/2c/Closed_Friedmann_universe_zero_Lambda.ogg/220px--Closed_Friedmann_universe_zero_Lambda.ogg.jpg
> >.
>
> The full stack trace is included in <
> https://bugzilla.wikimedia.org/show_bug.cgi?id=64152>, filed by Reedy
> yesterday. It appears File::getMimeType is returning 'unknown/unknown' and
> that File::getHandler is consequently not able to find a handler.
>


The problem has happened again this morning between 8:25 and 8:35 UTC. This
time the load was so high that ganglia stopped graphing data. From an
analysis of the logs, while it is true we have a lot of fatals for that url
above, it is also true that the number of requests for that url is quite
low and does not present a spike in that interval. So the problem is
genuine load and that is probably caused by some large processing.

The problem resolved before I could get to strace the apache processes, so
I don't have more details - Faidon was investigating as well and may have
more info.

Giuseppe

_______________________________________________
Multimedia mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/multimedia

Re: [Multimedia] [Ops] Brief image scalers outage, Mon Apr 21 03:12 UTC

Reply via email to