Ummmm, throw bandwidth at it.  ...which reminds me... I actually want a t-shirt 
that says....   "Bandwidth solves a lot"

-aaron

-----Original Message-----
From: Jean St-Laurent <j...@ddostest.me> 
Sent: Thursday, April 1, 2021 2:01 PM
To: aar...@gvtc.com; 'Jared Mauch' <ja...@puck.nether.net>; 'Töma Gavrichenkov' 
<xima...@gmail.com>
Cc: 'NANOG' <nanog@nanog.org>
Subject: RE: wow, lots of akamai

I remembered working for a big ISP in Europe offering cable tv + internet with 
+20M subscribers

Every time there was a huge power outage in major cities, all tv`s would go off 
at the same time. I don`t have stats on power grid stability in Europe Vs N/A.

The problem, was when the power was coming back in big cities, all the tv 
subscribers would come back online at the exact same second or minute.
More or less the same 2 or 3 minutes.

What happened is that it would create a kind of internal DDoS and they would 
all timed out and give a weird error message. Something very useful like Error 
Code 0x8098808 Please call our support line at this phone number.

The server sysadmins would go on a panic because all systems were overloaded. 
They often needed to do overtime because DB crashed, key servers there crashed, 
DB here crashed, whatever... there was always something crashing.  This was 
before the cloud when you could just push a slider and have tons of VMs or 
containers to absorb the load in real time. (in my dream)

This would every time create frustration from the clients, the help desk, the 
support teams and also the upper management. Every time the teams were really 
tired after that. It was draining juice.

Anyway, after some years of talking internally (red tape), we finally managed 
to install a random artificial penalty in the setup boxes when they boot after 
a power outage. Nothing like 20 minutes, but just enough to spread the load 
over a longer period of time. For the end user, it went transparent for them 
because, if the setup box would boot in 206 seconds instead of the super 
aggressive 34 seconds, well it booted and they could watch tv. 
Vs 

my system is totally frozen and it`s been like that for 20 minutes with weird 
messages because all your systems are down and the error msg said to call the 
help desk.

This simple change to add 3 lines of code to add a random artificial boot 
penalty of few seconds, completely solve the problem. This way, when a city 
would black out, we wouldn't be self DDoS, because the systems would slowly 
rampup. The setup boxes would all reboot but, wait randomly before asking for 
the DRM package to unlock the cable TV service and validate whether billing is 
right.

I`m no Call of Duty expert nor Akamai, but it's been many times that I observe 
the same question here:

What's happening?
Call of Duty!
Okay.

Would a kind of throttle help here? 

An artificial roll out penalty somehow? Probably not at the ISP level, but more 
at the game level. Well, ISP could also have some mechanisms to reduce the 
impact or even Akamai could force a progressive roll out. 

I`m not sure that the proposed solutions could work, but it seems to impact 
NANOG frequently and/or at least generate a call overnight/weekend. It seems to 
also happens just before long holidays when operations are sometimes on reduce 
personnel.

Are big games roll out really impacting NANOG? or it's more a: Hey I was 
curious what happened and I thought to ask here on NANOG?

#JustCurious

Jean

-----Original Message-----
From: NANOG <nanog-bounces+jean=ddostest...@nanog.org> On Behalf Of 
aar...@gvtc.com
Sent: April 1, 2021 12:12 PM
To: 'Jared Mauch' <ja...@puck.nether.net>; 'Töma Gavrichenkov' 
<xima...@gmail.com>
Cc: 'NANOG' <nanog@nanog.org>
Subject: RE: wow, lots of akamai

Gaming update... I had a feeling.  Thanks for the feedback folks.

Thanks Jared, it's running well, before, during and after.  We have a lot of 
capacity there.

-Aaron



Reply via email to