On Wed, Feb 10, 2010 at 9:50 PM, Gregory Szorc <[email protected]> wrote: >> what does "pstack PID" display when it is hung? > > 3305: ./testall testpoll > fedc9a45 portfs (6, 37, 8110b98, 32, 32, 0) > fef898a6 apr_pollcb_poll (8110b88, ffffffff, ffffffff, 806626d, 8047b04, > 8047b3c) + 82 > 08066408 trigger_pollcb (8047b3c, 0, 80772eb, 0, 80f55d8, 80f55d8) + 117 > 080557c1 abts_run_test (80f5b20, 80662f1, 0, 0, 807384d, 8088eac) + 56 > 08066904 testpoll (80f5b20, 4, fefcab34, 8047ba4, 16, 807512a) + 1fa > 08056171 main (8055170, 2, 8047bbc) + 20f > 08055170 _start (2, 8047ca8, 8047cb2, 0, 8047cbb, 8047cf3) + 80
Hmmm... If I comment out the send_msg() call so that there's no data available yet, my backtrace looks like 25072: ./testall testpoll fee04157 portfs (6, 37, 81155a0, 32, 1, 0) fef87218 call_port_getn (37, 81155a0, 32, 8047180, ffffffff, ffffffff) + c8 fef881a4 apr_pollcb_poll (8115590, ffffffff, ffffffff, 8066c40, 80471c0, 0) + 54 08066dde trigger_pollcb (80471fc, 0) + fe 08056331 abts_run_test (80fa528, 8066ce0, 0) + 71 08067274 testpoll (80fa528, 0) + 184 08056ee3 main (2, 8047284, 8047290, 8055c8f) + 213 08055ced _start (2, 80473b8, 80473c2, 0, 80473cb, 804743d) + 7d Interestingly, the fifth parm to portfs() in your backtrace is 32 == nalloc, and the fifth parm to portfs() in mine is 1. Meanwhile, there's a bug fix in 1.3.12 to fix a hang in apr_pollcb_poll() on Solaris by passing 1 instead of nalloc for the number of events to wait for. I can't imagine how you wouldn't have the fix or would be running the wrong libapr, but can you check with pldd which libapr is getting loaded just in case? (I'm guessing the absence of call_port_getn() in the backtrace is due to gcc inlining, though apr <= 1.3.8 doesn't have that function.) > I have access to other Solaris releases. I can always try to compile and > test on them. I may also start going through the SVN commits and isolating > the failure to a specific revision. Of course, this could all be related to > my toolchain - I'm using GNU for everything but the linker. Still, a bug is > a bug. Sure; it might be quicker to use LD_LIBRARY_PATH to run the 1.3.12 testall testpoll against apr 1.3.9's libapr before sorting through individual commits. (so weird)
