On Sep 10, 2005, at 1:26 PM, Malc wrote:
Malcolm Smith wrote:Hi all, Threading problemsI think there may be problems with Sockets or threads. (See attached bt).I can reliably reproduce this problem. SetupMaster Backend / Slave backend both in idle state. This only fails when a live slave backend in idle state is presentMethod:Open 2 browsers and request status simultaneously. The thread on the master backend handling status / web crashes, recording continues on both backend will remain in this state until killed. There is never any further response from the status port 6544/6543. (Sometimes it takes a few tries, so it's something to do with collision timing).It can also be reproduced when requesting multiple activities via mythweb that take time to process (rescheds, status, deletes), but only ever when slave backend is present.For background master has DVB, slave has DVB and PVR250 card. Master server also has mysql running on it. Both backends are built from SVN from Weds 31 Aug, identical distributions.back trace attachedBecause of this the WAF is dropping, as she's impatient on the web.... help please.Thread 12 (Thread 31009712 (LWP 4318)): #0 0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x02b9c7f6 in __nanosleep_nocancel () from /lib/tls/libc.so.6 #2 0x02bcad6a in usleep () from /lib/tls/libc.so.6 #3 0x007dbcaa in EITScanner::RunEventLoop (this=0x8109d90) at eitscanner.cpp:62 #4 0x007dbc6f in EITScanner::SpawnEventLoop (param=0x8109d90) at eitscanner.cpp:50 #5 0x0018e98c in start_thread () from /lib/tls/libpthread.so.0 #6 0x02bd17da in clone () from /lib/tls/libc.so.6 Thread 11 (Thread 129358768 (LWP 4439)): #0 0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x02b9c7f6 in __nanosleep_nocancel () from /lib/tls/libc.so.6 #2 0x02bcad6a in usleep () from /lib/tls/libc.so.6 #3 0x006c9364 in TVRec::RunTV (this=0x8101af0) at tv_rec.cpp:1612#4 0x006c8dd9 in TVRec::EventThread (param=0x8101af0) at tv_rec.cpp:1534#5 0x0018e98c in start_thread () from /lib/tls/libpthread.so.0 #6 0x02bd17da in clone () from /lib/tls/libc.so.6 Thread 10 (Thread 98745264 (LWP 4441)): #0 0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2#1 0x00193eee in __lll_mutex_lock_wait () from /lib/tls/ libpthread.so.0#2 0x00190df4 in _L_mutex_lock_29 () from /lib/tls/libpthread.so.0 #3 0xf6e00010 in ?? () #4 0xf6f48cf8 in ?? () #5 0x070d5214 in ?? () from /usr/lib/qt-3.3/lib/libqt-mt.so.3 #6 0x00a76ee0 in ?? () #7 0x080ef278 in ?? () #8 0x05e2b5f8 in ?? () #9 0x06f972f0 in QRecursiveMutexPrivate::lock ()from /usr/lib/qt-3.3/lib/libqt-mt.so.3 Previous frame identical to this frame (corrupt stack?)I've spent some time tracking through the code.I think I'm getting somewhere with this. This problem only seems to be common in the following circumstances (but will occur in other circumstances).Master backend is acting as a middleware (i.e. a backend to a client and client to slave backends/mysql). Examples of this are: 1. Requesting status (localhost:6545) with slave backends available from a web browser 2. Requesting sql and file activity from a frontend or browser, esp when slave backends are presentThe critical bit of code seems to be: - programs/mythbackend/playbacksock.cpp bool PlaybackSock::SendReceiveStringList(QStringList &strlist) { sock->Lock(); ...... Thread will hang at sock->Lock and never return.I've attached a patch which does fix the problem, but does make the code more stable, by not getting stuck on the lock. What the patch does is to use tryLock to see if the lock can be obtained. If not, it tries once every 0.1s for 20 times. If not lock after that, then it aborts the SendReceive.This means whatever the calling code was trying to get done (e.g. schedule, delete etc) doesn't get done.... It wouldn't have anyway.. and would have required a restart of the masterbackend! I can't think of any critical activity on myth that requires critical confirmation and execution only once. e.g. if delete didn't work then just retry.. frustrating but less so than a restart.Can people try this patch, see whether it inceases stability for them? I've had no lockups using this code.
I can't get this patch to apply, I get the following error: [EMAIL PROTECTED]:~/mythtv$ patch -p0 < socket.patch patching file programs/mythbackend/playbacksock.cpp patch: **** malformed patch at line 40: sockLock.lock(); I see this bug all the time so I can't wait to test this patch. Geoff
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ mythtv-dev mailing list [email protected] http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-dev
