Yesterday I upgraded my last production box (remote) from 4.3 to 4.4., without any hitch, rebooted, and so forth. Last night at some innocuous time, it stopped accepting incoming mail (postfix). This morning, it did courier-imap well, until I used an existing ssh-session like this:

# pwd
/usr/src/usr.sbin/httpd
# cd /var/log/ # /usr/local/sbin/post postalias postfix postkick postqueue postcat postfix-disable postlock postsuper postconf postfix-enable postlog postdrop postfix-install postmap # /usr/local/sbin/postfix status ^C^Z

Now it is stuck like this for an hour or so. It still takes keyboard input, though. Courier-imag also does not respond any longer. But nmap is still somewhat okay:

$ nmap -sV 172.16.0.4

Starting Nmap 4.68 ( http://nmap.org ) at 2009-01-10 08:58 SGT
Interesting ports on 172.16.0.4:
Not shown: 1707 closed ports
PORT    STATE SERVICE VERSION
13/tcp  open  daytime
22/tcp  open  ssh?
25/tcp  open  smtp?
37/tcp  open  time     (32 bits)
53/tcp  open  domain?
80/tcp  open  http    Apache httpd
110/tcp open  pop3?
993/tcp open  imaps?

daytime works fine, http works very well, but domain, pop3 and smtp time out; or worse: all get stuck like here:

$ telnet 172.16.0.4 25
Trying 172.16.0.4...
Connected to 172.16.0.4.
Escape character is '^]'.
helo
mail from:[email protected]
quit
^C^Z

$ telnet 172.16.0.4 110
Trying 172.16.0.4...
Connected to 172.16.0.4.
Escape character is '^]'.
user udippel

Why do I write in:

1. I have no access. It is a remote production server. If I only could stop that 'hanging' postfix, I might be able to issue a 'reboot'

2. Any further trial to ssh into it also get stuck like this:
$ ssh -v 172.16.0.4
OpenSSH_5.1, OpenSSL 0.9.7j 04 May 2006
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Connecting to 172.16.0.4 [172.16.0.4] port 22.
debug1: Connection established.
debug1: identity file /home/users/udippel/.ssh/identity type -1
debug1: identity file /home/users/udippel/.ssh/id_rsa type -1
debug1: identity file /home/users/udippel/.ssh/id_dsa type -1
after which I can only leave by killing the session on the client.

3. Even if I went there with a huge effort, and some time delay, how can I debug the problem, so that it won't occur again?


Thanks for all ideas,

Uwe

Reply via email to