On Mon, 6 Sep 2004, Kurt Mosiejczuk wrote:
>Okay, since Andreas asked me to attach to one of the berserk bincimapd
>processes, they have all been well-behaved for *12 days*. I was
>starting to give up hope, but today one went nuts.
Maybe if you put up a 24h monitoring team to look for processes, Binc will
stay nice forever ;-). Okay, so at least it seems that it's not every-day
activity that causes this situation to arise.
>Here's the trace:
>#0 0xf43904d in _thread_sys_getdirentries ()
>#1 0xf45659a in readdir ()
>#2 0x1c02a939 in Binc::Maildir::updateFlags (this=0x3c01a000)
> at maildir-updateflags.cc:106
>#3 0x1c05c837 in Binc::StoreOperator::process (this=0x3c019160,
> [EMAIL PROTECTED], [EMAIL PROTECTED]) at operator-store.cc:124
So you stopped Binc while it was committing flag updates to the mailbox
after as the final step of STORE. If this is really where Binc is
spinning, then it's got to be the readdir() libc call. The loop Binc is in
starts at the beginning of a directory and continues to the end.
I find it interesting that you're running Binc over NFS; but..
>The garbage in the middle was where gdb wasn't giving me anything...
>the process was hung up in NFS, so I think it was awaiting answers. I
>did send the process a couple kill signals before gdb got control.
>Hopefully this helps... I'd love to resolve this issue and be able to
>have my department in love with my mail server :)
Let's see what we can do. I'm going to have to ask for another trace, this
time with strace. If you find a haywire Binc process, first verify that
it's bincimap-up or bincimapd that is spinning. Then attach to the haywire
process like this:
strace -s 1024 -p <bananas-pid> >/tmp/dumpfile.txt 2>&1
After letting it run for a while (say, 10-20 secs or so), hit Ctrl-C. The
contents of this dump file is very likely going to bring us close to a
solution. See if bzip2 -9 gets dumpfile.txt down to a reasonable size, and
then make it available to me somehow :-). Either by email or as a link to
a website.
Then, attach to the same process/pid with gdb and provide a new backtrace.
This way, I can compare this to the one you already posted, and I'll know
for sure where it's spinning and why.
Meanwhile, I'm looking up all docs I can find about NFS and problems with
readdir(). :-/
Andy :-)
--
Andreas Aardal Hanssen | http://www.andreas.hanssen.name/gpg
Author of Binc IMAP | "It is better not to do something
http://www.bincimap.org/ | than to do it poorly."