> Samuel,
>
> I was trying out some quick things to root-cause the problem
> and I noticed that mtu_reboot test sometimes gets stuck. I see that
> the test does this
>
> tester ----> Test machine
>
> tester remotely (via ssh/rsh) executes dladm call on test_machine
> and then reboots test_machine. After that, the tester does a
> wait_for_reboot.
>
> When I tried this today, the wait_for_reboot does not return,
> even though the test_machine is up (and I can ping it, and
> do "rsh -n test_machine echo good" on it).
>
> Is there some bug in wait_for_reboot? Is it testing for the correct
> return status from ping/rsh?
>
When the test is running:
102911 stf_jnl_context mtu_reboot
102912 /bin/ksh -p
/var/tmp/test/proto/suites/gldv3/linkprop/mtu_reboot
102934 /bin/ksh -p
/var/tmp/test/proto/suites/gldv3/linkprop/mtu_reboot
102938 /usr/bin/ssh -n -l root whitestar2-5 reboot
When the test is hung :
102910 stf_timeout -n linkprop/mtu_reboot 600
stf_jnl_context mtu_reboot
102911 stf_jnl_context mtu_reboot
102912 <defunct>
bash-3.00# pstack 102911
102911: stf_jnl_context mtu_reboot
c5320e15 read (4, 8046b9c, 400)
080518d5 main (2, 8046ff8, 8047004) + 219
08051546 ???????? (2, 80471a4, 80471b4, 0, 80471bf, 80471e0)
The problem is with STF itself ,the stf_jnl_context is waiting for some
input while the child has exited.
Samuel