Hello, happy people, Lately, I've been wrapping my head around Traffic Server 3.2/3.3 not running well on FreeBSD. The exact issue is described in TS-993 as well: 1) When starting TS, it runs up a hefty CPU bill (100% cpu used at all times), even when idling. 2) It crashes and burns when compiled with --enable-debug, complaining:
FATAL: ../../lib/ts/ink_thread.h:267: failed assert `pthread_cond_wait(cp, mp) == 0` After giving up on doing a git bisect (my computer is simply too slow for all those recompiles), I tried running it through callgrind to analyze the function calls being made, and discovered that LogObjectManager::flush_buffers() was being called about 11 million times during the first few minutes, which is not good. So I opened up Log.cc, and discovered, to my surprise, that, apart from flushing buffers in a loop there, we are calling ink_cond_wait without any apparent locking of the flush_mutex we are supposed to release while waiting for the condition. On FreeBSD at least, this results in an EPERM error (caller does not own the thread being released), which in turn means that there will be no waiting, it's just one big cpu sink. The addition of "ink_mutex_try_acquire(&flush_mutex);" before the ink_cond_wait, seems to have fixed this problem, and TS starts fine, doesn't use 100% while idling, and doesn't complain when running in debug mode,ie an apparent win-win situation for my FreeBSD machines. However - and because Igor told me to - since this doesn't seem to be an issue on Linux, I was wondering...does the mutex in question lock somewhere else that I am unaware of, or did we simply forget to lock it and are lucky that Linux somehow takes care of this blunder for us? In any case, I don't think adding ink_mutex_try_acquire could hurt anything, and since it does seem to fix the FreeBSD/OpenBSD issue at hand, I am mostly interested in any comments you lot would have about it before I go and commit the fix (if it is a fix, that's what I'm asking ;). With regards, Daniel.