> Samuel,
>
> I was trying out some quick things to root-cause the problem
> and I noticed that mtu_reboot test sometimes gets stuck. I see that
> the test does this
>
>           tester ----> Test machine
>
> tester remotely (via ssh/rsh) executes dladm call on test_machine
> and then reboots test_machine. After that, the tester does a 
> wait_for_reboot.
>
> When I tried this today, the wait_for_reboot does not return,
> even though the test_machine is up (and I can ping it, and
> do "rsh -n test_machine echo good" on it).
>
> Is there some bug in wait_for_reboot? Is it testing for the correct
> return status from ping/rsh?
>   
When the test is running:
                  102911 stf_jnl_context mtu_reboot
                    102912 /bin/ksh -p 
/var/tmp/test/proto/suites/gldv3/linkprop/mtu_reboot
                      102934 /bin/ksh -p 
/var/tmp/test/proto/suites/gldv3/linkprop/mtu_reboot
                        102938 /usr/bin/ssh -n -l root whitestar2-5 reboot
When the test is hung :
                102910 stf_timeout -n linkprop/mtu_reboot 600 
stf_jnl_context mtu_reboot
                  102911 stf_jnl_context mtu_reboot
                    102912 <defunct>

bash-3.00# pstack 102911
102911: stf_jnl_context mtu_reboot
 c5320e15 read     (4, 8046b9c, 400)
 080518d5 main     (2, 8046ff8, 8047004) + 219
 08051546 ???????? (2, 80471a4, 80471b4, 0, 80471bf, 80471e0)

The problem is with STF itself ,the stf_jnl_context is waiting for some 
input while the child has exited.

Samuel



Reply via email to