guy keren wrote:

why not strace sshd and find what it's doing?
That turned out to be a brilliant idea. I was more aggressive in my approach than you suggested, since I need to know what happens before I can log in, so I injected an strace into the service startup script (with a nomination to the world's ugliest hack). Then I started the service and opened a session while the system behaved normally, and rebooted to try it out during those few minutes of unresponsiveness.

I went for the -ff option, which means that I got a file for each fork (and there are oh so many). With a little Perl script to pick up the most time consuming commands, I reached the critical snippet, which is actually the beginning of one of the processes.

What we can see, is that trying to open /dev/tty (and fail) took 30 + epsilon seconds, and then opening /dev/pts/0 (successfully) took another 30 + epsilon seconds. Makes me wonder what timeout is set to 30 seconds in the kernel. Needless to say, in the proper strace, these operations took a fraction of a second, like one would expect. Interestingly, it's exactly the same chain of events, only different timing (give or take which pts was opened).

Looks like I exchanged one kernel bug for another. Maybe I should report this to LKML...? And still, if this really is a kernel bug, how could it escape the simplest test before the kernel was released?

  Eli

-----
Trace for freshly booted system follows:

21:35:21.131094 close(12)               = 0 <0.000046>
21:35:21.131177 close(13)               = 0 <0.000052>
21:35:21.131259 close(4)                = 0 <0.000059>
21:35:21.131382 close(8)                = 0 <0.000027>
21:35:21.131436 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such device or address) <*30.022532*>
21:35:51.154269 setsid()                = 2333 <0.000155>
21:35:51.154639 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such device or address) <0.020142>
21:35:51.174931 ioctl(9, TIOCSCTTY)     = 0 <0.000030>
21:35:51.175049 rt_sigaction(SIGHUP, NULL, {SIG_DFL, [], 0}, 8) = 0 <0.000028> 21:35:51.175176 rt_sigaction(SIGHUP, {SIG_IGN, [], SA_RESTORER, 0x7fad91c31740}, NULL, 8) = 0 <0.000028> 21:35:51.175309 vhangup() = -1 EPERM (Operation not permitted) <0.000026> 21:35:51.175407 rt_sigaction(SIGHUP, NULL, {SIG_IGN, [], SA_RESTORER, 0x7fad91c31740}, 8) = 0 <0.000027> 21:35:51.175524 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 0x7fad91c31740}, NULL, 8) = 0 <0.000028>
21:35:51.175642 open("/dev/pts/0", O_RDWR) = 4 <*30.063213*>
21:36:21.238952 close(9)                = 0 <0.000033>
21:36:21.239057 open("/dev/tty", O_WRONLY) = 8 <0.021415>
21:36:21.260555 close(8)                = 0 <0.000034>
21:36:21.260660 dup2(4, 0)              = 0 <0.000021>
21:36:21.260786 dup2(4, 1)              = 1 <0.000027>
21:36:21.260882 dup2(4, 2)              = 2 <0.000026>
21:36:21.260973 close(4)                = 0 <0.000023>

(and so on...)

And here's the parallel trace for a warm and cosy system:

21:28:19.163952 close(12)               = 0 <0.000052>
21:28:19.164125 close(13)               = 0 <0.000051>
21:28:19.164295 close(4)                = 0 <0.000043>
21:28:19.164435 close(8)                = 0 <0.000055>
21:28:19.164577 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such device or address) <*0.000068*>
21:28:19.164891 setsid()                = 19458 <0.000022>
21:28:19.164990 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such device or address) <0.000066>
21:28:19.165132 ioctl(9, TIOCSCTTY)     = 0 <0.000024>
21:28:19.165230 rt_sigaction(SIGHUP, NULL, {SIG_DFL, [], 0}, 8) = 0 <0.000043> 21:28:19.165379 rt_sigaction(SIGHUP, {SIG_IGN, [], SA_RESTORER, 0x7f78bf8fe740}, NULL, 8) = 0 <0.000034> 21:28:19.165502 vhangup() = -1 EPERM (Operation not permitted) <0.000033> 21:28:19.165604 rt_sigaction(SIGHUP, NULL, {SIG_IGN, [], SA_RESTORER, 0x7f78bf8fe740}, 8) = 0 <0.000048> 21:28:19.165828 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 0x7f78bf8fe740}, NULL, 8) = 0 <0.000038>
21:28:19.165937 open("/dev/pts/2", O_RDWR) = 4 <*0.000027*>
21:28:19.166011 close(9)                = 0 <0.000017>
21:28:19.166065 open("/dev/tty", O_WRONLY) = 8 <0.000021>
21:28:19.166134 close(8)                = 0 <0.000014>
21:28:19.166191 dup2(4, 0)              = 0 <0.000014>
21:28:19.166246 dup2(4, 1)              = 1 <0.000014>
21:28:19.166301 dup2(4, 2)              = 2 <0.000014>
21:28:19.166355 close(4)                = 0 <0.000014>

--
Web: http://www.billauer.co.il

_______________________________________________
Haifux mailing list
[email protected]
http://hamakor.org.il/cgi-bin/mailman/listinfo/haifux

Reply via email to