Winston Williams wrote:

This is a continuation of my 'sshd suddenly not responding' message from
Tuesday.

I still haven't resolved the problems on this machine.  I had to have
someone at the data center reboot the machine so that I could get back
in over ssh.  After they rebooted the machine, I was able to work for
about 20 minutes before the ssh session (and sshd) died again.  I
put /sbin/reboot in the crontab and tested it, and the machine rebooted.
I left that in the crontab to run hourly, and I also put in another
entry to kill and restart sshd every 30 minutes.  I also let that run
and it worked.  I stopped qmail and I disabled pf but I left apache
running.
After that 20 minutes or so, my ssh session died unexpectedly, and when
I went to reconnect, the socket opens on that port but then it just sits
forever.  It never shows the OpenSSH banner and nothing further happens.
Apache is still running and working fine.  Here is where it gets really
strange... The crontab for reboot does not run now, and neither does the
crontab to restart ssh.  I know it is not rebooting because I run hping
and it never has an interruption.  I now suspect that the machine is
unable to fork new processes.

Here are the results of some tests that I have run:

1-When I connect via SSH, the socket connects but then just sits before
any data is sent.  I suspect that the main process listens and accepts
the connection, but then tries to fork a new process and fails.

2-named is still running and seems to be working fine

3-Nothing on cron seems to run at this point.  I tested the entires in
cron by letting them run while the system was operating normally, and
they did work when the system was operating normally, like after a fresh
reboot for that 20 minute or so window.  After that, the reboot never
happens and I don't think it is killing and restarting sshd either

4-Apache can still do it's thing.  I am assuming this is because it
automatically starts a number of processes right away.  It has enough
processes already running so that it does not need to fork when a new
connection comes in.

5-One other interesting thing to note is that /var/log/authlog was
around 21,000 lines when I checked it.  The OS install is only about 5
days old.  I moved ssh to a non-standard port to try to help reduce the
random break-in attempts.

I would really like to use OpenBSD on this machine.  If I can't figure
it out in the next day or two, I will have to switch to another
operating system.

Do any of you have any ideas for what I could try to either test out
this fork failure theory, or other suggestions for what might be causing
my problem?
I have had 1 issue which maybe similar to this.

3.6 on a HP Netserver.

The server was suspect so very little fault finding was done before it was replaced and many other boxes with 3.6 and
3.7 have zero problems.

On the Netserver I blocked Linux OS from accessing ssh port with PF as I exclusively use OpenBSD and the problem did not
occur again but as mentioned it was replaced fairly shortly afterwards.

Steve

Reply via email to