Interestingly enough, last night I had an EX switch have something happen with its onboard flash, and the thing ate it pretty hard.
Came back up with errors like this, and then just crashed again shortly. Jun 26 00:48:14 tor-205-a.sv.<snipped> fpc0 Route TCAM rows need not be redirected on device 0. Jun 26 00:48:14 tor-205-a.sv.<snipped> fpc0 Route TCAM rows need not be redirected on device 1. Jun 26 00:48:15 tor-205-a.sv.<snipped> fpc0 PFEM: Enabling traffic for dev 0 Jun 26 00:48:15 tor-205-a.sv.<snipped> chassisd[985]: LIBJSNMP_SA_PARTIAL_SEND_FRAG: Attempted to send 68 bytes, actually sent 4 bytes Jun 26 00:48:15 tor-205-a.sv.<snipped> chassisd[985]: LIBJSNMP_SA_PARTIAL_SEND_REM: Queuing message remainder, 64 bytes Jun 26 00:48:15 tor-205-a.sv.<snipped> fpc0 PFEM: Enabling traffic for dev 1 Jun 26 00:48:17 tor-205-a.sv.<snipped> /kernel: RT_PFE: RT msg op 1 (PREFIX ADD) failed, err 5 (Invalid) Jun 26 00:48:17 tor-205-a.sv.<snipped> chassisd[985]: LIBJSNMP_SA_PARTIAL_SEND_FRAG: Attempted to send 68 bytes, actually sent 52 bytes Jun 26 00:48:17 tor-205-a.sv.<snipped> chassisd[985]: LIBJSNMP_SA_PARTIAL_SEND_REM: Queuing message remainder, 16 bytes Jun 26 00:48:19 tor-205-a.sv.<snipped> chassisd[985]: LIBJSNMP_SA_PARTIAL_SEND_FRAG: Attempted to send 68 bytes, actually sent 56 bytes Jun 26 00:48:20 tor-205-a.sv.<snipped> chassisd[985]: LIBJSNMP_SA_PARTIAL_SEND_REM: Queuing message remainder, 12 bytes Jun 26 00:48:21 tor-205-a.sv.<snipped> lldpd[1009]: LIBESPTASK_SNMP_CONN_RETRY: snmp_epi_reg_refresh: reattempting connection to SNMP agent (register MIBs): Resource temporarily unavailable Jun 26 00:48:22 tor-205-a.sv.<snipped> /kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 0 0 19 6a 0 0 0 20 0 Jun 26 00:48:22 tor-205-a.sv.<snipped> /kernel: (da0:umass-sim0:0:0:0): CAM Status: SCSI Status Error Jun 26 00:48:22 tor-205-a.sv.<snipped> /kernel: (da0:umass-sim0:0:0:0): SCSI Status: Check Condition Jun 26 00:48:22 tor-205-a.sv.<snipped> /kernel: (da0:umass-sim0:0:0:0): MEDIUM ERROR asc:11,0 Jun 26 00:48:22 tor-205-a.sv.<snipped> /kernel: (da0:umass-sim0:0:0:0): Unrecovered read error Jun 26 00:48:22 tor-205-a.sv.<snipped> /kernel: (da0:umass-sim0:0:0:0): Retrying Command (per Sense Data) Jun 26 00:48:23 tor-205-a.sv.<snipped> /kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 0 0 19 6b 0 0 0 80 0 Jun 26 00:48:23 tor-205-a.sv.<snipped> /kernel: (da0:umass-sim0:0:0:0): CAM Status: SCSI Status Error Jun 26 00:48:23 tor-205-a.sv.<snipped> /kernel: (da0:umass-sim0:0:0:0): SCSI Status: Check Condition Jun 26 00:48:23 tor-205-a.sv.<snipped> /kernel: (da0:umass-sim0:0:0:0): ILLEGAL REQUEST asc:20,0 Jun 26 00:48:23 tor-205-a.sv.<snipped> /kernel: (da0:umass-sim0:0:0:0): Invalid command operation code Jun 26 00:48:23 tor-205-a.sv.<snipped> /kernel: (da0:umass-sim0:0:0:0): Unretryable error Jun 26 00:48:23 tor-205-a.sv.<snipped> /kernel: g_vfs_done():da0s3e[READ(offset=67502080, length=65536)]error = 22 Jun 26 00:48:23 tor-205-a.sv.<snipped> /kernel: vnode_pager_getpages: I/O read error Jun 26 00:48:23 tor-205-a.sv.<snipped> /kernel: vm_fault: pager read error, pid 1047 (cp) Jun 26 00:48:25 tor-205-a.sv.<snipped> fpc0 pfe_pme_max 24 Jun 26 00:48:25 tor-205-a.sv.<snipped> fpc0 PFEMAN: Sent Resync request to Master Jun 26 00:48:25 tor-205-a.sv.<snipped> fpc0 MRVL-L2:mrvl_brg_port_stg_entry_set(),293:l2ifl not found for ifl 4! Jun 26 00:48:25 tor-205-a.sv.<snipped> fpc0 MRVL-L2:mrvl_brg_port_stg_create(),539:Port-STG-Set failed(Invalid Params:-2) Jun 26 00:48:25 tor-205-a.sv.<snipped> fpc0 RT-HAL,rt_entry_add_msg_proc,2790: l2_halp_vectors->l2_entry_create failed Jun 26 00:48:25 tor-205-a.sv.<snipped> fpc0 RT-HAL,rt_entry_add_msg_proc,2883: proto MSTI,len 48 prefix 00004:00254 nh 82 Jun 26 00:48:25 tor-205-a.sv.<snipped> fpc0 RT-HAL,rt_msg_handler,597: route process failed On Wed, Jun 26, 2013 at 5:16 AM, Martin T <[email protected]> wrote: > I did not try "set chassis redundancy failover on-disk-failure" as > this should be for GRES configuration, but I have single RE both in > M10i and M20. > > > regards, > Martin > > 2013/6/26, Per Granath <[email protected]>: > > Note that this is two different configurations: > > > > set chassis routing-engine on-disk-failure disk-failure-action reboot > > set chassis redundancy failover on-disk-failure > > > > Did you try both? > > > > > > -----Original Message----- > > From: Martin T [mailto:[email protected]] > > Sent: Wednesday, June 26, 2013 11:58 AM > > To: Per Granath > > Cc: [email protected]; [email protected] > > Subject: Re: [j-nsp] what happens if HDD on routing-engine fails during > the > > router operation? > > > > Hi, > > > > I did now :) However, it had no effect. On the other hand, dismounting > the > > /var is not near the same as completely removing or failure of the HDD > on a > > working routing-engine. > > > > > > Example with M20: > > > > root@M20> show configuration chassis > > routing-engine { > > on-disk-failure disk-failure-action reboot; } > > > > root@M20> show system processes brief > > last pid: 1475; load averages: 0.00, 0.12, 0.15 up 0+00:11:35 > > 07:08:28 > > 105 processes: 3 running, 86 sleeping, 16 waiting > > > > Mem: 136M Active, 115M Inact, 32M Wired, 132M Cache, 69M Buf, 1580M Free > > Swap: 2048M Total, 2048M Free > > > > > > > > > > root@M20> start shell csh > > root@M20% mount > > /dev/ad0s1a on / (ufs, local, noatime) > > devfs on /dev (devfs, local) > > devfs on /dev/ (devfs, local, noatime, noexec, read-only) > > /dev/md0 on /packages/mnt/jbase (cd9660, local, noatime, read-only) > > /dev/md1 on /packages/mnt/jkernel-9.4R3.5 (cd9660, local, noatime, > > read-only) > > /dev/md2 on /packages/mnt/jpfe-M40-9.4R3.5 (cd9660, local, noatime, > > read-only) > > /dev/md3 on /packages/mnt/jdocs-9.4R3.5 (cd9660, local, noatime, > read-only) > > /dev/md4 on /packages/mnt/jroute-9.4R3.5 (cd9660, local, noatime, > > read-only) > > /dev/md5 on /packages/mnt/jcrypto-9.4R3.5 (cd9660, local, noatime, > > read-only) > > /dev/md6 on /packages/mnt/jpfe-common-9.4R3.5 (cd9660, local, noatime, > > read-only) > > /dev/md7 on /tmp (ufs, local, noatime, soft-updates) > > /dev/md8 on /mfs (ufs, local, noatime, soft-updates) /dev/ad0s1e on > /config > > (ufs, local, noatime) procfs on /proc (procfs, local, noatime) > /dev/ad1s1f > > on /var (ufs, local, noatime) root@M20% umount -f /var root@M20% mount > > /dev/ad0s1a on / (ufs, local, noatime) devfs on /dev (devfs, local) > devfs on > > /dev/ (devfs, local, noatime, noexec, read-only) > > /dev/md0 on /packages/mnt/jbase (cd9660, local, noatime, read-only) > > /dev/md1 on /packages/mnt/jkernel-9.4R3.5 (cd9660, local, noatime, > > read-only) > > /dev/md2 on /packages/mnt/jpfe-M40-9.4R3.5 (cd9660, local, noatime, > > read-only) > > /dev/md3 on /packages/mnt/jdocs-9.4R3.5 (cd9660, local, noatime, > read-only) > > /dev/md4 on /packages/mnt/jroute-9.4R3.5 (cd9660, local, noatime, > > read-only) > > /dev/md5 on /packages/mnt/jcrypto-9.4R3.5 (cd9660, local, noatime, > > read-only) > > /dev/md6 on /packages/mnt/jpfe-common-9.4R3.5 (cd9660, local, noatime, > > read-only) > > /dev/md7 on /tmp (ufs, local, noatime, soft-updates) > > /dev/md8 on /mfs (ufs, local, noatime, soft-updates) /dev/ad0s1e on > /config > > (ufs, local, noatime) procfs on /proc (procfs, local, noatime) root@M20% > > exit exit > > > > root@M20> ? > > No valid completions > > root@M20> > > error: unknown command: .noop-command > > > > > > root@M20> Jun 26 07:09:49 init: can't chdir to /var/tmp/: No such file > or > > directory Jun 26 07:09:54 init: can't chdir to /var/tmp/: No such file or > > directory Jun 26 07:09:59 init: can't chdir to /var/tmp/: No such file or > > directory Jun 26 07:10:04 init: can't chdir to /var/tmp/: No such file or > > directory Jun 26 07:10:04 init: can't chdir to /var/tmp/: No such file or > > directory > > > > > > > > Example with M10i: > > > > root@M10i> show configuration chassis > > routing-engine { > > on-disk-failure disk-failure-action reboot; } > > > > root@M10i> show system processes brief > > last pid: 1473; load averages: 3.97, 1.22, 0.47 up 0+00:02:46 > > 08:17:13 > > 111 processes: 5 running, 89 sleeping, 17 waiting > > > > Mem: 181M Active, 54M Inact, 33M Wired, 216M Cache, 69M Buf, 1012M Free > > Swap: 2048M Total, 2048M Free > > > > > > > > > > root@M10i> start shell csh > > root@M10i% mount > > /dev/ad0s1a on / (ufs, local, noatime) > > devfs on /dev (devfs, local, multilabel) > > /dev/md0 on /packages/mnt/jbase (cd9660, local, noatime, read-only, > > verified) > > /dev/md1 on /packages/mnt/jkernel-10.4R12.4 (cd9660, local, noatime, > > read-only, verified) > > /dev/md2 on /packages/mnt/jpfe-M7i-10.4R12.4 (cd9660, local, noatime, > > read-only) > > /dev/md3 on /packages/mnt/jdocs-10.4R12.4 (cd9660, local, noatime, > > read-only, verified) > > /dev/md4 on /packages/mnt/jroute-10.4R12.4 (cd9660, local, noatime, > > read-only, verified) > > /dev/md5 on /packages/mnt/jcrypto-10.4R12.4 (cd9660, local, noatime, > > read-only, verified) > > /dev/md6 on /packages/mnt/jpfe-common-10.4R12.4 (cd9660, local, noatime, > > read-only) > > /dev/md7 on /packages/mnt/jruntime-10.4R12.4 (cd9660, local, noatime, > > read-only, verified) > > /dev/md8 on /tmp (ufs, asynchronous, local, noatime) > > /dev/md9 on /mfs (ufs, asynchronous, local, noatime) /dev/ad0s1e on > /config > > (ufs, local, noatime) procfs on /proc (procfs, local, noatime) > /dev/ad1s1f > > on /var (ufs, local, noatime) root@M10i% umount -f /var root@M10i% mount > > /dev/ad0s1a on / (ufs, local, noatime) devfs on /dev (devfs, local, > > multilabel) > > /dev/md0 on /packages/mnt/jbase (cd9660, local, noatime, read-only, > > verified) > > /dev/md1 on /packages/mnt/jkernel-10.4R12.4 (cd9660, local, noatime, > > read-only, verified) > > /dev/md2 on /packages/mnt/jpfe-M7i-10.4R12.4 (cd9660, local, noatime, > > read-only) > > /dev/md3 on /packages/mnt/jdocs-10.4R12.4 (cd9660, local, noatime, > > read-only, verified) > > /dev/md4 on /packages/mnt/jroute-10.4R12.4 (cd9660, local, noatime, > > read-only, verified) > > /dev/md5 on /packages/mnt/jcrypto-10.4R12.4 (cd9660, local, noatime, > > read-only, verified) > > /dev/md6 on /packages/mnt/jpfe-common-10.4R12.4 (cd9660, local, noatime, > > read-only) > > /dev/md7 on /packages/mnt/jruntime-10.4R12.4 (cd9660, local, noatime, > > read-only, verified) > > /dev/md8 on /tmp (ufs, asynchronous, local, noatime) > > /dev/md9 on /mfs (ufs, asynchronous, local, noatime) /dev/ad0s1e on > /config > > (ufs, local, noatime) procfs on /proc (procfs, local, noatime) root@M10i > % > > Jun 26 08:18:04 init: can't chdir to /var/tmp/: No such file or directory > > exit exit > > > > root@M10i> Jun 26 08:18:09 init: can't chdir to /var/tmp/: No such file > or > > directory ? > > No valid completions > > root@M10i> Jun 26 08:18:15 init: can't chdir to /var/tmp/: No such file > or > > directory Jun 26 08:18:20 init: can't chdir to /var/tmp/: No such file or > > directory Jun 26 08:18:20 init: can't chdir to /var/tmp/: No such file or > > directory > > > > > > One other important thing what happens if HDD fails is that swap space is > > lost. This is probably rather critical with for example RE-333-256. > > In addition, looks like the RE-850 has no problems with booting up > without > > the HDD while RE-600 or RE-333 do not boot up without HDD.. > > > > > > Still, what exactly makes the RE reload when HDD is lost? > > > > > > regards, > > Martin > > > > 2013/6/26, Per Granath <[email protected]>: > >> Did you try it with this configuration? > >> > >> chassis { > >> redundancy { > >> failover { > >> on-loss-of-keepalives; > >> on-disk-failure; > >> } > >> } > >> } > >> > >> > >> > >> _______________________________________________ > >> juniper-nsp mailing list [email protected] > >> https://puck.nether.net/mailman/listinfo/juniper-nsp > >> > > > _______________________________________________ > juniper-nsp mailing list [email protected] > https://puck.nether.net/mailman/listinfo/juniper-nsp > -- Thanks, Morgan _______________________________________________ juniper-nsp mailing list [email protected] https://puck.nether.net/mailman/listinfo/juniper-nsp

