Alex Sharp <[email protected]> wrote:
> Ok, thanks a lot for the for the patch attempt and subsequent in-depth
> explanation. I know this thread is long-lived and a bit confusing,
> mostly because I've seen two different sets of scenarios that yield
> the same result, which is that unicorn does not restart.
No problem, I'm here to help and mutt makes it easy to follow long threads :>
<snip>
> I thought that setting the Unicorn::HttpServer::START_CTX[0] in my
> unicorn config had solved the first scenario (pegged cpu), but I found
> a new occurrence of it today, and I now have some new strace output
> for this scenario (strace -v):
>
> select(4, [3], NULL, NULL, {60, 727229}) = 0 (Timeout)
> wait4(-1, 0x7ffffd70f72c, WNOHANG, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1313708081, 720757662}) = 0
> fstat(8, {st_dev=makedev(202, 1), st_ino=39806, st_mode=S_IFREG,
> st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0,
> st_size=0, st_atime=2011/08/18-14:32:10, st_mtime=2011/08/18-14:32:10,
> st_ctime=2011/08/18-22:54:06}) = 0
> clock_gettime(CLOCK_REALTIME, {1313708081, 721131305}) = 0
> fstat(10, {st_dev=makedev(202, 1), st_ino=45370, st_mode=S_IFREG,
> st_nlink=0, st_uid=1001, st_gid=1001, st_blksize=4096, st_blocks=0,
> st_size=0, st_atime=2011/08/18-14:32:10, st_mtime=2011/08/18-14:32:10,
> st_ctime=2011/08/18-22:54:05}) = 0
> clock_gettime(CLOCK_REALTIME, {1313708081, 721290800}) = 0
> select(4, [3], NULL, NULL, {565, 34005}) = 0 (Timeout)
<snip> OK, this looks good from the system call level in the main
thread of the master process:
I can see that from select() timing out despite being called with
long intervals. I don't see workers dying, either...
Can you add "-f -e '!futex'" to the strace invocation so we see
all the threads? You can also add '-T' to get timing information
to see how long each syscall takes or '-tt' to get timestamps
in strace if you think it's useful.
Since the main thread doesn't appear to be doing much, it's
probably a child thread somewhere...
Also, I assume you're using preload_app? If you are, do you see this
issue with preload_app=false? I suspect there's some background thread
that could be running in the master process taking up all the CPU time.
Unicorn itself never spawns background threads.
Basically anything you can tell use about the app, the configuration,
and which gems/libraries would be useful.
> With respect to the second scenario (ignoring signals), I'm going to
> recommend we table that issue for now, as we're currently running on a
> version of ubuntu (11.10) which has a known signal trapping bug with
> ruby 1.9.2-p180, and downgrading to 10.04 may solve that problem.
> Granted, when I've observed this in the past with other libraries, the
> process becomes completely unresponsive, whereas the behavior in
> unicorn is more intermittent. Either way, we are in the process of
> downgrading our servers to ubuntu 10.04, so let's not waste anymore
> time trying to debug something that may well be occurring due to a
> kernel bug. If we continue to see this after the downgrade to 10.04,
> I'll report back and we can keep digging.
OK. I wouldn't rule out the CPU usage as unrelated to the signaling
issue, though. The Ruby 1.9 timer thread could be going crazy
because it can't signal or receive signals properly...
--
Eric Wong
_______________________________________________
Unicorn mailing list - [email protected]
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying