Malc wrote:
Malcolm
Smith wrote:
Hi all,
Threading problems
I think there may be problems with Sockets or threads. (See attached
bt).
I can reliably reproduce this problem.
Setup
Master Backend / Slave backend both in idle state. This only fails when
a live slave backend in idle state is present
Method:
Open 2 browsers and request status simultaneously. The thread on the
master backend handling status / web crashes, recording continues on
both backend will remain in this state until killed. There is never any
further response from the status port 6544/6543. (Sometimes it takes a
few tries, so it's something to do with collision timing).
It can also be reproduced when requesting multiple activities via
mythweb that take time to process (rescheds, status, deletes), but only
ever when slave backend is present.
For background master has DVB, slave has DVB and PVR250 card. Master
server also has mysql running on it.
Both backends are built from SVN from Weds 31 Aug, identical
distributions.
back trace attached
Because of this the WAF is dropping, as she's impatient on the web....
help please.
Thread 12 (Thread 31009712 (LWP 4318)):
#0 0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x02b9c7f6 in __nanosleep_nocancel () from /lib/tls/libc.so.6
#2 0x02bcad6a in usleep () from /lib/tls/libc.so.6
#3 0x007dbcaa in EITScanner::RunEventLoop (this=0x8109d90)
at eitscanner.cpp:62
#4 0x007dbc6f in EITScanner::SpawnEventLoop (param=0x8109d90)
at eitscanner.cpp:50
#5 0x0018e98c in start_thread () from /lib/tls/libpthread.so.0
#6 0x02bd17da in clone () from /lib/tls/libc.so.6
Thread 11 (Thread 129358768 (LWP 4439)):
#0 0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x02b9c7f6 in __nanosleep_nocancel () from /lib/tls/libc.so.6
#2 0x02bcad6a in usleep () from /lib/tls/libc.so.6
#3 0x006c9364 in TVRec::RunTV (this=0x8101af0) at tv_rec.cpp:1612
#4 0x006c8dd9 in TVRec::EventThread (param=0x8101af0) at
tv_rec.cpp:1534
#5 0x0018e98c in start_thread () from /lib/tls/libpthread.so.0
#6 0x02bd17da in clone () from /lib/tls/libc.so.6
Thread 10 (Thread 98745264 (LWP 4441)):
#0 0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00193eee in __lll_mutex_lock_wait () from
/lib/tls/libpthread.so.0
#2 0x00190df4 in _L_mutex_lock_29 () from /lib/tls/libpthread.so.0
#3 0xf6e00010 in ?? ()
#4 0xf6f48cf8 in ?? ()
#5 0x070d5214 in ?? () from /usr/lib/qt-3.3/lib/libqt-mt.so.3
#6 0x00a76ee0 in ?? ()
#7 0x080ef278 in ?? ()
#8 0x05e2b5f8 in ?? ()
#9 0x06f972f0 in QRecursiveMutexPrivate::lock ()
from /usr/lib/qt-3.3/lib/libqt-mt.so.3 Previous frame identical to
this frame (corrupt stack?)
I've spent some time tracking through the code.
I think I'm getting somewhere with this. This problem only seems to be
common in the following circumstances (but will occur in other
circumstances).
Master backend is acting as a middleware (i.e. a backend to a client
and client to slave backends/mysql). Examples of this are:
1. Requesting status (localhost:6545) with slave backends available
from a web browser
2. Requesting sql and file activity from a frontend or browser, esp
when slave backends are present
The critical bit of code seems to be:
- programs/mythbackend/playbacksock.cpp
bool PlaybackSock::SendReceiveStringList(QStringList &strlist)
{
sock->Lock();
......
Thread will hang at sock->Lock and never return.
I've attached a patch which does fix the problem, but does make the
code more stable, by not getting stuck on the lock.
What the patch does is to use tryLock to see if the lock can be
obtained. If not, it tries once every 0.1s for 20 times. If not lock
after that, then it aborts the SendReceive.
This means whatever the calling code was trying to get done (e.g.
schedule, delete etc) doesn't get done.... It wouldn't have anyway..
and would have required a restart of the masterbackend! I can't think
of any critical activity on myth that requires critical confirmation
and execution only once. e.g. if delete didn't work then just retry..
frustrating but less so than a restart.
Can people try this patch, see whether it inceases stability for them?
I've had no lockups using this code.
Obviously the long term solution is to fix the problem. I'm happy to
help discuss this problem.
More on the problem... looking at the activity
Case 1 - This is a standard activity
Client MasterBackend Slave backend
1 Req->
1 Process
Sendreceive Lock
1 Req ->
-> 1Receive req
1 SlaveProcess
<- 1 Response
1 Response <-
Sendreceive unlock
<- 1 Response
In the case where two requests are made quickly in a row this seems to
be happening. (e.g. requesting status from 2 different browsers within
a second or so of each other)
Case 2 - failure
Client MasterBackend Slave backend
1 Req->
1 Process
1 Sendreceive Lock
1 Req ->
-> 1Receive req
1 SlaveProcess (in
this case slave process takes time... maybe swapping or processing)
2 Req
->
(new req arrives)
2 Process
2 SendreceiveLock (In
some cases this seems to fail, rather than waiting to come free)
<- 1 Response
(If this response is never sent, or gets lost then thread is forever
locked)
Index: programs/mythbackend/playbacksock.cpp
===================================================================
--- programs/mythbackend/playbacksock.cpp (revision 7191)
+++ programs/mythbackend/playbacksock.cpp (working copy)
@@ -2,6 +2,9 @@
#include <iostream>
+// C headers
+#include <unistd.h>
+
using namespace std;
#include "playbacksock.h"
@@ -59,7 +62,23 @@
bool PlaybackSock::SendReceiveStringList(QStringList &strlist)
{
- sock->Lock();
+ int itertry = 0;
+
+ if (!sock->tryLock()) {
+ VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList trying to send - could not obtain Mutex lock -- waiting" );
+ while (!sock->tryLock() && itertry<=20) {
+ usleep(100000);
+ itertry ++;
+// sock->Lock();
+ }
+ if (itertry>= 20) {
+ //don't send
+ VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList - did not send receive" );
+ return 0;
+ }
+ VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList - loxk eventually obtained - contuning " );
+ }
+ // Carry on
sock->UpRef();
sockLock.lock();
Index: programs/mythbackend/server.h
===================================================================
--- programs/mythbackend/server.h (revision 7191)
+++ programs/mythbackend/server.h (working copy)
@@ -20,6 +20,7 @@
bool IsInProcess(void) { return inUse; }
void Lock() { lock.lock(); }
+ bool tryLock() { return lock.tryLock(); }
void Unlock() { lock.unlock(); }
protected:
I can't get this
patch to apply, I get the following error:
[EMAIL PROTECTED]:~/mythtv$ patch -p0 < socket.patch
patching file programs/mythbackend/playbacksock.cpp
patch: **** malformed patch at line 40: sockLock.lock();
I see this bug all the time so I can't wait to test this patch.
Geoff
Try again with new patch
|
Index: programs/mythbackend/playbacksock.cpp
===================================================================
--- programs/mythbackend/playbacksock.cpp (revision 7191)
+++ programs/mythbackend/playbacksock.cpp (working copy)
@@ -2,6 +2,9 @@
#include <iostream>
+// C headers
+#include <unistd.h>
+
using namespace std;
#include "playbacksock.h"
@@ -59,7 +62,23 @@
bool PlaybackSock::SendReceiveStringList(QStringList &strlist)
{
- sock->Lock();
+ int itertry = 0;
+
+ if (!sock->tryLock()) {
+ VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList trying to
send - could not obtain Mutex lock -- waiting" );
+ while (!sock->tryLock() && itertry<=20) {
+ usleep(100000);
+ itertry ++;
+// sock->Lock();
+ }
+ if (itertry>= 20) {
+ //don't send
+ VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList - did
not send receive" );
+ return 0;
+ }
+ VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList - loxk
eventually obtained - contuning " );
+ }
+ // Carry on
sock->UpRef();
sockLock.lock();
Index: programs/mythbackend/server.h
===================================================================
--- programs/mythbackend/server.h (revision 7191)
+++ programs/mythbackend/server.h (working copy)
@@ -20,6 +20,7 @@
bool IsInProcess(void) { return inUse; }
void Lock() { lock.lock(); }
+ bool tryLock() { return lock.tryLock(); }
void Unlock() { lock.unlock(); }
protected:
_______________________________________________
mythtv-dev mailing list
[email protected]
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-dev