L’incident est-il complètement terminé ? J’ai encore un symptôme, mais pourtant pas de pertes de paquet à priori.
> Le 15 janv. 2019 à 22:44, Simon Muyal <smu...@franceix.net> a écrit : > > Bonjour Raphael, > > > Comme l'avait dit Youssef, nous avons communiqué hier sur le canal privilégié > qui est la ML France-IX, suivi par l'ensemble des membres France-IX. Nous > avons posté un rapport plus complet aujourd'hui, voir ci-dessous. > > Nous avons bien vérifié hier soir que les problèmes avec CF ou Google dans la > soirée n'étaient pas associés à France-IX. > > Pour ce qui est des communications pendant les incidents, nous privilégions > la ML France-IX et demandons aux membres d'utiliser ce canal pour un meilleur > suivi. Nous allons également créer une page prochainement pour afficher les > maintenances/incidents en cours. Ca permettra de donner de la visibilité > également aux non-membres comme tu l'indiques. > > > ++ > > Simon > > > --- > > Dear members, > > You will find below a report concerning the issue encountered yesterday in > the afternoon: > > *12:20 (Paris time):* We started observing some unusual BUM traffic > (Broadcast, Unknown Unicast, Multicast) on PoPs where BUM rate limiting is > performed globally (not per interface): PA7, PAR1 and TH3 PoPs. > > We tried to determine the origin of this flooded traffic, looking for loops, > checking MAC addresses consistency on different PoPs. At this stage, our > probe's network (a 10G probe per device) didn't raise any alert and there was > no loss observed by probes. Nonetheless, we had some members complaining, > indicating losses towards France-IX. > > The sniffer's captures allowed us to determine that it was unknown unicast > traffic from several sources to few destinations. BUM traffic reached 10 to > 15Mbps. This traffic was observed even if MAC table entries were OK. > > *Around 15:00 (Paris time) :* BUM traffic reached more than 50Mbps, causing > additional impact, mainly on small and medium routers on customer side. We > cleared some MAC address entries where we observed flooding, with no effect. > As we didn't observe any abnormal behaviour on customer side we started > clearing some MPLS/LSPs circuits and shutting down backbone links one by one > to avoid to create additional impact. This allowed to isolate the problem, > issue was located on PAR5 PoP, clearing MPLS circuits used between PAR5 and > PAR1. During these operations, PAR1 PoP was isolated during 4 minutes between > 16:00 and 16:04 in order to find the root cause. > > We are in touch with the vendor to understand this behaviour and sharing logs > to find the root cause. We will keep you informed as soon as we have more > information. > > --- > Location: FranceIX Paris LAN > > Incident start: 14th of January 2018, 12:21 (UTC+1, Paris Time) > Incident end: 14th of January 2018, 16:08 (UTC+1, Paris Time) > > Customer impact: Some members observed packed loss during this period > --- > > We share with you the different works in progress to detect this kind of > issues: > > - Specific alerts when BUM traffic threshold is reached on every PoP > (*already done since yesterday*) > - Enhancement on QoS probes to be as close as possible to member > configuration : BGP router configured on each probe and permanent traffic > generated. This will be deployed in Q1-2019 > - We plan to test 18.R1 firmware soon. This version enhances the way of > processes and memory are managed in the platform. This will be tested during > Q1-2019 and probably deployed during Q2-2019 > - EVPN : For long term, we plan to activate EVPN, and BUM traffic will be > better controlled > - Definition of a specific process to react quickly if the issue occurs > again > We apologize again for such issue. Sorry if you considered we didn't > communicate enough during the incident, we communicated as soon as we had new > information to provide > > > Le 14/01/2019 à 21:40, Raphael Mazelier a écrit : >> On 14/01/2019 20:59, Radu-Adrian Feurdean wrote: >> >>> Presque certainement pas. Le traffic avait disparu aussi via Equinix-IX >>> pour passer (apres une chute brutale) entierement sur du transit. >>> Actuellement ca a l'air de preprendre un peu cote Equinix. Cote FranceIX, >>> je sais pas (je fais du prepend), mais le "festival Akamai" a bien commence >>> son episode de cette soiree (traffic qui bascule de PNI vers France-IX). >>> >> >> OK merci de la précision. Ce qui me faisait penser à ca c'était des reports >> de personne qui avait perdu 8.8.8.8 aussi. Sinon il y a quoi qui tabasse les >> CDNs en ce moment pour qu'ils doivent re-router ? >> >> -- >> Raphael Mazelier >> >> >> >> --------------------------- >> Liste de diffusion du FRnOG >> http://www.frnog.org/ > > -- > Simon Muyal > CTO > FranceIX > Tél: +33 (0)1 70 61 97 74 > Mob: +33 (0)6 21 17 29 51 > > > --------------------------- > Liste de diffusion du FRnOG > http://www.frnog.org/ --------------------------- Liste de diffusion du FRnOG http://www.frnog.org/