Re: [j-nsp] MX304 Port Layout

Mark Tinka via juniper-nsp Thu, 08 Jun 2023 20:53:47 -0700


On 6/9/23 00:03, Litterick, Jeff (BIT) via juniper-nsp wrote:

The big issue we ran into is if you have redundant REs then there is a super 
bad bug that after 6 hours (1 of our 3 would lock up after reboot quickly and 
the other 2 would take a very long time) to 8 days will lock the entire chassis 
up solid where we had to pull the REs physical out to reboot them.     It is 
fixed now, but they had to manually poke new firmware into the ASICs on each RE 
when they were in a half-powered state,  Was a very complex procedure with tech 
support and the MX304 engineering team.  It took about 3 hours to do all 3 
MX304s  one RE at a time.   We have not seen an update with this built-in yet.  
(We just did this back at the end of April)

Oh dear, that's pretty nasty. So did they say new units shipping todaywould come with the RE's already fixed?

We've been suffering a somewhat similar issue on the PTX1000, where abug was introduced via regression in Junos 21.4, 22.1 and 22.2 thatcauses CPU queues to get filled up by unknown MAC address frames, andare not cleared. It takes 64 days for this packet accumulation to growto a point where the queues get exhausted, causing a host loopback wedge.


You would see an error like this in the logs:

<date> <time> <hostname> alarmd[27630]: Alarm set: FPC id=150995048,color=RED, class=CHASSIS, reason=FPC 0 Major Errors<date> <time> <hostname> fpc0 Performing action cmalarm for error/fpc/0/pfe/0/cm/0/Host_Loopback/0/HOST_LOOPBACK_MAKE_CMERROR_ID[1](0x20002) in module: Host Loopback with scope: pfe category: functionallevel: major<date> <time> <hostname> fpc0 Cmerror Op Set: Host Loopback: HOSTLOOPBACK WEDGE DETECTED IN PATH ID 1 (URI:/fpc/0/pfe/0/cm/0/Host_Loopback/0/HOST_LOOPBACK_MAKE_CMERROR_ID[1])Apr 1 03:52:28 PTX1000 fpc0 CMError:/fpc/0/pfe/0/cm/0/Host_Loopback/0/HOST_LOOPBACK_MAKE_CMERROR_ID[3](0x20004), in module: Host Loopback with scope: pfe category: functionallevel: major

This causes the router to drop all control plane traffic, which,basically, makes it unusable. One has to reboot the box to get it backup and running, until it happens again 64 days later.


The issue is resolved in Junos 21.4R3-S4, 22.4R2, 23.2R1 and 23.3R1.

However, these releases are not shipping yet, so Juniper gave us aworkaround SLAX script that automatically runs and clears the CPU queuesbefore the 64 days are up.

We are currently running Junos 22.1R3.9 on this platform, and will moveto 22.4R2 in a few weeks to permanently fix this.


Junos 20.2, 20.3 and 20.4 are not affected, nor is anything after 23.2R1.

I understand it may also affect the QFX and MX, but I don't have detailson that.


Mark.

_______________________________________________
juniper-nsp mailing list [email protected]
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] MX304 Port Layout

Reply via email to