Malc wrote:
Malcolm Smith wrote:

Hi all,


Threading problems

I think there may be problems with Sockets or threads. (See attached bt).

I can reliably reproduce this problem.

Setup
Master Backend / Slave backend both in idle state. This only fails when a live slave backend in idle state is present

Method:
Open 2 browsers and request status simultaneously. The thread on the master backend handling status / web crashes, recording continues on both backend will remain in this state until killed. There is never any further response from the status port 6544/6543. (Sometimes it takes a few tries, so it's something to do with collision timing).

It can also be reproduced when requesting multiple activities via mythweb that take time to process (rescheds, status, deletes), but only ever when slave backend is present.

For background master has DVB, slave has DVB and PVR250 card. Master server also has mysql running on it.
Both backends are built from SVN from Weds 31 Aug, identical distributions.

back trace attached

Because of this the WAF is dropping, as she's impatient on the web.... help please.


Thread 12 (Thread 31009712 (LWP 4318)):
#0  0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x02b9c7f6 in __nanosleep_nocancel () from /lib/tls/libc.so.6
#2  0x02bcad6a in usleep () from /lib/tls/libc.so.6
#3  0x007dbcaa in EITScanner::RunEventLoop (this=0x8109d90)
   at eitscanner.cpp:62
#4  0x007dbc6f in EITScanner::SpawnEventLoop (param=0x8109d90)
   at eitscanner.cpp:50
#5  0x0018e98c in start_thread () from /lib/tls/libpthread.so.0
#6  0x02bd17da in clone () from /lib/tls/libc.so.6

Thread 11 (Thread 129358768 (LWP 4439)):
#0  0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x02b9c7f6 in __nanosleep_nocancel () from /lib/tls/libc.so.6
#2  0x02bcad6a in usleep () from /lib/tls/libc.so.6
#3  0x006c9364 in TVRec::RunTV (this=0x8101af0) at tv_rec.cpp:1612
#4  0x006c8dd9 in TVRec::EventThread (param=0x8101af0) at tv_rec.cpp:1534
#5  0x0018e98c in start_thread () from /lib/tls/libpthread.so.0
#6  0x02bd17da in clone () from /lib/tls/libc.so.6

Thread 10 (Thread 98745264 (LWP 4441)):
#0  0x001cf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00193eee in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#2  0x00190df4 in _L_mutex_lock_29 () from /lib/tls/libpthread.so.0
#3  0xf6e00010 in ?? ()
#4  0xf6f48cf8 in ?? ()
#5  0x070d5214 in ?? () from /usr/lib/qt-3.3/lib/libqt-mt.so.3
#6  0x00a76ee0 in ?? ()
#7  0x080ef278 in ?? ()
#8  0x05e2b5f8 in ?? ()
#9  0x06f972f0 in QRecursiveMutexPrivate::lock ()
  from /usr/lib/qt-3.3/lib/libqt-mt.so.3 Previous frame identical to this frame (corrupt stack?)




I've spent some time tracking through the code.

I think I'm getting somewhere with this. This problem only seems to be common in the following circumstances (but will occur in other circumstances).

Master backend is acting as a middleware (i.e. a backend to a client and client to slave backends/mysql). Examples of this are:
1. Requesting status (localhost:6545) with slave backends available from a web browser
2. Requesting sql and file activity from a frontend or browser, esp when slave backends are present

The critical bit of code seems to be:
- programs/mythbackend/playbacksock.cpp

bool PlaybackSock::SendReceiveStringList(QStringList &strlist)
{
  sock->Lock();

 ......

Thread will hang at sock->Lock and never return.

I've attached a patch which does fix the problem, but does make the code more stable, by not getting stuck on the lock.
What the patch does is to use tryLock to see if the lock can be obtained. If not, it tries once every 0.1s for 20 times. If not lock after that, then it aborts the SendReceive.

This means whatever the calling code was trying to get done (e.g. schedule, delete etc) doesn't get done.... It wouldn't have anyway.. and would have required a restart of the masterbackend! I can't think of any critical activity on myth that requires critical confirmation and execution only once. e.g. if delete didn't work then just retry.. frustrating but less so than a restart.

Can people try this patch, see whether it inceases stability for them? I've had no lockups using this code.

Obviously the long term solution is to fix the problem. I'm happy to help discuss this problem.


More on the problem... looking at the activity

Case 1 - This is a standard activity
Client     MasterBackend       Slave backend

 1 Req->
              1 Process
           Sendreceive Lock
                 1  Req ->
                                             -> 1Receive req
                                                  1 SlaveProcess
                                             <-  1 Response
                  1 Response <-
             Sendreceive unlock
            <- 1 Response

In the case where two requests are made quickly in a row this seems to be happening. (e.g. requesting status from 2 different browsers within a second or so of each other)
Case 2 - failure

Client     MasterBackend       Slave backend

 1 Req->
              1 Process
          1 Sendreceive Lock
                 1  Req ->
                                             -> 1Receive req
                                                  1 SlaveProcess    (in this case slave process takes time... maybe swapping or processing)
2 Req ->                                                               (new req arrives)
              2 Process
            2 SendreceiveLock                                   (In some cases this seems to fail, rather than waiting to come free)
                                             <-  1 Response       (If this response is never sent, or gets lost then thread is forever locked)
 


Index: programs/mythbackend/playbacksock.cpp =================================================================== --- programs/mythbackend/playbacksock.cpp (revision 7191) +++ programs/mythbackend/playbacksock.cpp (working copy) @@ -2,6 +2,9 @@ #include <iostream> +// C headers +#include <unistd.h> + using namespace std; #include "playbacksock.h" @@ -59,7 +62,23 @@ bool PlaybackSock::SendReceiveStringList(QStringList &strlist) { - sock->Lock(); + int itertry = 0; + + if (!sock->tryLock()) { + VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList trying to send - could not obtain Mutex lock -- waiting" ); + while (!sock->tryLock() && itertry<=20) { + usleep(100000); + itertry ++; +// sock->Lock(); + } + if (itertry>= 20) { + //don't send + VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList - did not send receive" ); + return 0; + } + VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList - loxk eventually obtained - contuning " ); + } + // Carry on sock->UpRef(); sockLock.lock(); Index: programs/mythbackend/server.h =================================================================== --- programs/mythbackend/server.h (revision 7191) +++ programs/mythbackend/server.h (working copy) @@ -20,6 +20,7 @@ bool IsInProcess(void) { return inUse; } void Lock() { lock.lock(); } + bool tryLock() { return lock.tryLock(); } void Unlock() { lock.unlock(); } protected:
I can't get this patch to apply, I get the following error:

[EMAIL PROTECTED]:~/mythtv$ patch -p0 < socket.patch
patching file programs/mythbackend/playbacksock.cpp
patch: **** malformed patch at line 40: sockLock.lock();

I see this bug all the time so I can't wait to test this patch.
Geoff


Try again with new patch




Index: programs/mythbackend/playbacksock.cpp
===================================================================
--- programs/mythbackend/playbacksock.cpp       (revision 7191)
+++ programs/mythbackend/playbacksock.cpp       (working copy)
@@ -2,6 +2,9 @@
 
 #include <iostream>
 
+// C headers
+#include <unistd.h>
+
 using namespace std;
 
 #include "playbacksock.h"
@@ -59,7 +62,23 @@
 
 bool PlaybackSock::SendReceiveStringList(QStringList &strlist)
 {
-    sock->Lock();
+   int itertry = 0;
+
+     if (!sock->tryLock()) {    
+       VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList trying to 
send - could not obtain Mutex lock -- waiting" );
+        while (!sock->tryLock() && itertry<=20) {
+              usleep(100000);
+             itertry ++;
+//           sock->Lock();
+         }
+      if (itertry>= 20) {
+         //don't send
+          VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList - did 
not send receive"  );
+          return 0;
+        }
+          VERBOSE(VB_IMPORTANT, " PlaybackSock::SendReceiveStringList - loxk 
eventually obtained - contuning "  );
+    }
+   // Carry on
     sock->UpRef();
 
     sockLock.lock();
Index: programs/mythbackend/server.h
===================================================================
--- programs/mythbackend/server.h       (revision 7191)
+++ programs/mythbackend/server.h       (working copy)
@@ -20,6 +20,7 @@
     bool IsInProcess(void) { return inUse; }
 
     void Lock() { lock.lock(); }
+    bool tryLock() { return lock.tryLock(); }
     void Unlock() { lock.unlock(); }
 
   protected:
_______________________________________________
mythtv-dev mailing list
[email protected]
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-dev

Reply via email to