Hello, happy people,

Lately, I've been wrapping my head around Traffic Server 3.2/3.3 not
running well on FreeBSD. The exact issue is described in TS-993 as well:
1) When starting TS, it runs up a hefty CPU bill (100% cpu used at all
times), even when idling.
2) It crashes and burns when compiled with --enable-debug, complaining:

   FATAL: ../../lib/ts/ink_thread.h:267: failed assert
   `pthread_cond_wait(cp, mp) == 0`

After giving up on doing a git bisect (my computer is simply too slow
for all those recompiles), I tried running it through callgrind to
analyze the function calls being made, and discovered that
LogObjectManager::flush_buffers() was being called about 11 million
times during the first few minutes, which is not good. So I opened up
Log.cc, and discovered, to my surprise, that, apart from flushing
buffers in a loop there, we are calling ink_cond_wait without any
apparent locking of the flush_mutex we are supposed to release while
waiting for the condition. On FreeBSD at least, this results in an EPERM
error (caller does not own the thread being released), which in turn
means that there will be no waiting, it's just one big cpu sink.

The addition of "ink_mutex_try_acquire(&flush_mutex);" before the
ink_cond_wait, seems to have fixed this problem, and TS starts fine,
doesn't use 100% while idling, and doesn't complain when running in
debug mode,ie an apparent win-win situation for my FreeBSD machines.

However - and because Igor told me to - since this doesn't seem to be an
issue on Linux, I was wondering...does the mutex in question lock
somewhere else that I am unaware of, or did we simply forget to lock it
and are lucky that Linux somehow takes care of this blunder for us?

In any case, I don't think adding ink_mutex_try_acquire could hurt
anything, and since it does seem to fix the FreeBSD/OpenBSD issue at
hand, I am mostly interested in any comments you lot would have about it
before I go and commit the fix (if it is a fix, that's what I'm asking ;).

With regards,
Daniel.

Reply via email to