Hi, M and MX series routing-engines have HDD(or SSD) installed which has a UFS and is mounted to /var. /var directory contains many important sub-directories like "log" for log files, "crash" for core-dumps, "tmp" for some temporary files etc. However, what happens if HDD fails while the routing-engine is operational? As there is no easy way to remove a HDD on an operating RE, I dismounted HDD from file-system on an operational routing-engine. First example is with M20(RE-600):
root@M20> show chassis hardware detail | match ad ad0 245 MB SanDisk SDCFB-256 101120L0703U0953 Compact Flash ad1 28615 MB FUJITSU MHR2030AT D NJ69T3A14196 Hard Disk root@M20> start shell sh # uname -a JUNOS M20 9.4R3.5 JUNOS 9.4R3.5 #0: 2009-07-24 23:24:53 UTC [email protected]:/volume/build/junos/9.4/release/9.4R3.5/obj-i386/sys/compile/JUNIPER i386 # mount /dev/ad0s1a on / (ufs, local, noatime) devfs on /dev (devfs, local) devfs on /dev/ (devfs, local, noatime, noexec, read-only) /dev/md0 on /packages/mnt/jbase (cd9660, local, noatime, read-only) /dev/md1 on /packages/mnt/jkernel-9.4R3.5 (cd9660, local, noatime, read-only) /dev/md2 on /packages/mnt/jpfe-M40-9.4R3.5 (cd9660, local, noatime, read-only) /dev/md3 on /packages/mnt/jdocs-9.4R3.5 (cd9660, local, noatime, read-only) /dev/md4 on /packages/mnt/jroute-9.4R3.5 (cd9660, local, noatime, read-only) /dev/md5 on /packages/mnt/jcrypto-9.4R3.5 (cd9660, local, noatime, read-only) /dev/md6 on /packages/mnt/jpfe-common-9.4R3.5 (cd9660, local, noatime, read-only) /dev/md7 on /tmp (ufs, local, noatime, soft-updates) /dev/md8 on /mfs (ufs, local, noatime, soft-updates) /dev/ad0s1e on /config (ufs, local, noatime) procfs on /proc (procfs, local, noatime) /dev/ad1s1f on /var (ufs, local, noatime) # umount -f /var # mount /dev/ad0s1a on / (ufs, local, noatime) devfs on /dev (devfs, local) devfs on /dev/ (devfs, local, noatime, noexec, read-only) /dev/md0 on /packages/mnt/jbase (cd9660, local, noatime, read-only) /dev/md1 on /packages/mnt/jkernel-9.4R3.5 (cd9660, local, noatime, read-only) /dev/md2 on /packages/mnt/jpfe-M40-9.4R3.5 (cd9660, local, noatime, read-only) /dev/md3 on /packages/mnt/jdocs-9.4R3.5 (cd9660, local, noatime, read-only) /dev/md4 on /packages/mnt/jroute-9.4R3.5 (cd9660, local, noatime, read-only) /dev/md5 on /packages/mnt/jcrypto-9.4R3.5 (cd9660, local, noatime, read-only) /dev/md6 on /packages/mnt/jpfe-common-9.4R3.5 (cd9660, local, noatime, read-only) /dev/md7 on /tmp (ufs, local, noatime, soft-updates) /dev/md8 on /mfs (ufs, local, noatime, soft-updates) /dev/ad0s1e on /config (ufs, local, noatime) procfs on /proc (procfs, local, noatime) # clJun 25 12:03:55 init: can't chdir to /var/tmp/: No such file or directory ^R # Jun 25 12:04:01 init: can't chdir to /var/tmp/: No such file or directory # exit root@M20> Jun 25 12:04:06 init: can't chdir to /var/tmp/: No such file or directory error: unknown command: .noop-command WARNING: cli has been replaced by an updated version: CLI release 9.4R3.5 built by builder on 2009-07-24 23:11:30 UTC Restart cli using the new version ? [yes,no] (yes) Restarting cli ... Jun 25 12:04:11 init: can't chdir to /var/tmp/: No such file or directory Jun 25 12:04:11 init: can't chdir to /var/tmp/: No such file or directory could not open user interface connection: management daemon not responding Retry connection attempts ? [yes,no] (yes) yes could not open user interface connection: management daemon not responding Retry connection attempts ? [yes,no] (yes) no root@M20% ps aux | grep mgd root@M20% /usr/sbin/mgd -N mgd: error: could not open database: /var/run/db/schema.db: No such file or directory mgd: error: Database open failed for file '/var/run/db/schema.db': No such file or directory mgd: error: could not open database schema: /var/run/db/schema.db mgd: error: could not open database schema mgd: error: database schema is out of date, rebuilding it mgd: error: could not open database: /var/run/db/juniper.data: No such file or directory mgd: error: Database open failed for file '/var/run/db/juniper.data': No such file or directory mgd: error: Cannot read configuration: Could not open configuration database mgd: error: daemon MGD detects existing daemon using lock file '/var/run/mgd.pid' root@M20% mount /dev/ad1s1f /var root@M20% /usr/sbin/mgd root@M20% cli root@M20> Second example is with M10i(RE-850): root@M10i> show chassis hardware detail | match ad ad0 999 MB SILICONSYSTEMS INC 1GB C9183198528209048W01 Compact Flash ad1 38154 MB FUJITSU MHV2040AS NT19T842CY34 Hard Disk root@M10i> start shell sh # uname -a JUNOS M10i 10.4R12.4 JUNOS 10.4R12.4 #0: 2013-01-09 10:01:08 UTC [email protected]:/volume/build/junos/10.4/release/10.4R12.4/obj-i386/bsd/sys/compile/JUNIPER i386 # mount /dev/ad0s1a on / (ufs, local, noatime) devfs on /dev (devfs, local, multilabel) /dev/md0 on /packages/mnt/jbase (cd9660, local, noatime, read-only, verified) /dev/md1 on /packages/mnt/jkernel-10.4R12.4 (cd9660, local, noatime, read-only, verified) /dev/md2 on /packages/mnt/jpfe-M7i-10.4R12.4 (cd9660, local, noatime, read-only) /dev/md3 on /packages/mnt/jdocs-10.4R12.4 (cd9660, local, noatime, read-only, verified) /dev/md4 on /packages/mnt/jroute-10.4R12.4 (cd9660, local, noatime, read-only, verified) /dev/md5 on /packages/mnt/jcrypto-10.4R12.4 (cd9660, local, noatime, read-only, verified) /dev/md6 on /packages/mnt/jpfe-common-10.4R12.4 (cd9660, local, noatime, read-only) /dev/md7 on /packages/mnt/jruntime-10.4R12.4 (cd9660, local, noatime, read-only, verified) /dev/md8 on /tmp (ufs, asynchronous, local, noatime) /dev/md9 on /mfs (ufs, asynchronous, local, noatime) /dev/ad0s1e on /config (ufs, local, noatime) procfs on /proc (procfs, local, noatime) /dev/ad1s1f on /var (ufs, local, noatime) # umount -f /var # mount /dev/ad0s1a on / (ufs, local, noatime) devfs on /dev (devfs, local, multilabel) /dev/md0 on /packages/mnt/jbase (cd9660, local, noatime, read-only, verified) /dev/md1 on /packages/mnt/jkernel-10.4R12.4 (cd9660, local, noatime, read-only, verified) /dev/md2 on /packages/mnt/jpfe-M7i-10.4R12.4 (cd9660, local, noatime, read-only) /dev/md3 on /packages/mnt/jdocs-10.4R12.4 (cd9660, local, noatime, read-only, verified) /dev/md4 on /packages/mnt/jroute-10.4R12.4 (cd9660, local, noatime, read-only, verified) /dev/md5 on /packages/mnt/jcrypto-10.4R12.4 (cd9660, local, noatime, read-only, verified) /dev/md6 on /packages/mnt/jpfe-common-10.4R12.4 (cd9660, local, noatime, read-only) /dev/md7 on /packages/mnt/jruntime-10.4R12.4 (cd9660, local, noatime, read-only, verified) /dev/md8 on /tmp (ufs, asynchronous, local, noatime) /dev/md9 on /mfs (ufs, asynchronous, local, noatime) /dev/ad0s1e on /config (ufs, local, noatime) procfs on /proc (procfs, local, noatime) # exit root@M10i> sho ^ unknown command. root@M10i> show ^ unknown command. root@M10i> ? No valid completions root@M10i> start ^ unknown command. root@M10i> exit ^ unknown command. root@M10i> error: unknown command: .noop-command root@M10i> error: unknown command: .noop-command root@M10i> Jun 25 13:24:38 init: can't chdir to /var/tmp/: No such file or directory Jun 25 13:24:43 init: can't chdir to /var/tmp/: No such file or directory In case of M10i(RE-850) I waited for few hours after unmounting the /var for some watchdog timer to kick in, but nothing happened. Finally I just remounted the HDD and restarted the mgd process. RE worked as it should. According to KB19024, at least "Hard drive access suddenly lost" is one of the reasons which cause watchdog timer to reload the routing engine. Is the watchdog timer triggered only in case the HDD is physically removed aka HDD fails? What exactly does this watchdog timer check? regards, Martin _______________________________________________ juniper-nsp mailing list [email protected] https://puck.nether.net/mailman/listinfo/juniper-nsp

