Weird server mystery: self-reset, mostly

will trillich Tue, 25 Jan 2011 17:42:14 -0800

Never seen this before -- all daemons and all user processes killed. Zap. It
happened around 23:17 Sunday, Chicago time (that's when /var/log/* abruptly
stopped). Any idea what might cause this?



I was ssh'd in to my Debian server and... disconnected. No problem, I was
using *screen* to *vim* some *Catalyst* modules, so I'll just reconnect and
reattach... connection refused.

Wha?

Tried telnet to port 22, no sign of life. Tried telnet to port 80, no sign
of life.

Went to the server room, logged in on the console:

will@darth:~$ uptime
 23:58:11 *up 583 days*,  3:03,  6 users,  load average: 0.00, 0.02, 0.08

So the server hadn't had a hard reset, still up 583 days. In /var/log/syslog
there are the usual cron logs up to about 23:17 and then.. nothing.

will@darth:~$ ps afx
  PID TTY      STAT   TIME COMMAND
    2 ?        S<     0:00 [kthreadd]
    3 ?        S<     1:13  \_ [migration/0]
    4 ?        S<    29:21  \_ [ksoftirqd/0]
    5 ?        S<     0:32  \_ [watchdog/0]
    6 ?        S<     1:12  \_ [migration/1]
    7 ?        S<    77:19  \_ [ksoftirqd/1]
    8 ?        S<     0:02  \_ [watchdog/1]
    9 ?        S<    44:52  \_ [events/0]
   10 ?        S<    78:24  \_ [events/1]
   11 ?        S<     0:00  \_ [khelper]
   44 ?        S<    13:20  \_ [kblockd/0]
   45 ?        S<     0:40  \_ [kblockd/1]
   47 ?        S<     0:00  \_ [kacpid]
   48 ?        S<     0:00  \_ [kacpi_notify]
  121 ?        S<     0:00  \_ [kseriod]
  161 ?        S<    19:53  \_ [kswapd0]
  162 ?        S<     0:00  \_ [aio/0]
  163 ?        S<     0:00  \_ [aio/1]
  642 ?        S<     0:00  \_ [ksuspend_usbd]
  647 ?        S<     0:00  \_ [khubd]
  761 ?        S<     0:00  \_ [ata/0]
  764 ?        S<     0:00  \_ [ata/1]
  765 ?        S<     0:00  \_ [ata_aux]
  774 ?        S<     0:00  \_ [scsi_eh_0]
  775 ?        S<     0:00  \_ [scsi_eh_1]
  877 ?        S<    42:46  \_ [kjournald]
 1301 ?        S<    17:22  \_ [edac-poller]
 1384 ?        S<     0:00  \_ [kpsmoused]
 1640 ?        S<     0:00  \_ [kstriped]
 1654 ?        S<     0:00  \_ [ksnapd]
 1681 ?        S<    76:13  \_ [kjournald]
 1682 ?        S<   126:18  \_ [kjournald]
12642 ?        S      0:09  \_ [pdflush]
19987 ?        S      0:00  \_ [pdflush]
    1 ?        Ss    10:04 init [2]
11064 tty2     Ss+    0:00 /sbin/getty 38400 tty2
11065 tty3     Ss+    0:00 /sbin/getty 38400 tty3
11066 tty4     Ss+    0:00 /sbin/getty 38400 tty4
11067 tty5     Ss+    0:00 /sbin/getty 38400 tty5
11068 tty6     Ss+    0:00 /sbin/getty 38400 tty6
12995 tty1     Ss     0:00 /bin/login --
13077 tty1     S      0:00  \_ -bash
13107 tty1     R+     0:00      \_ ps afx

Freaky: init, that's process #1, isn't at the top? And all daemons except
for getty were gone. All user processes including my screen sessions! and
vim sessions!, were gone.

Checking 'last' didn't show any suspicious activity.

In kern.log there's only
Jan 23 23:04:59 darth kernel: [64084756.601774] exploit[25161]: segfault at
10c00b ip 00000000 sp deadc01d error 6
Jan 23 23:05:08 darth kernel: [64084765.528734] NET: Registered protocol
family 5

After a quick
$ sudo bash
# cd /etc/rc2.d
# for x in S*; do sh $x start; done

the server was back up and serving... and then the saddest sight of all, of
course:

will@darth:~$ screen -ls
There is a screen on:
        26279.pts-3.darth       (06/19/09 21:54:31)     (Dead ???)
Remove dead screens with 'screen -wipe'.
1 Socket in /var/run/screen/S-will.

:(

$ tail /var/log/messages
Jan 23 22:56:26 darth -- MARK --
Jan 23 23:04:59 darth kernel: [64084756.601774] exploit[25161]: segfault at
10c00b ip 00000000 sp deadc01d error 6
Jan 23 23:05:08 darth kernel: [64084765.528734] NET: Registered protocol
family 5
Jan 23 23:16:26 darth -- MARK --
Jan 23 23:47:02 darth syslogd 1.5.0#5: restart.

So everything crapped out after 23:16, and I restarted it at 23:47.

*Anybody got a clue as to what might have happened to kill all daemons and
user-processes in one swoop? This has been a rock-solid Debian server for
years...*

will@darth:~$ cat /etc/debian_version
5.0.4

-- 
The first step towards getting somewhere is to decide that you are not going
to stay where you are.  -- J.P.Morgan

Weird server mystery: self-reset, mostly

Reply via email to