On Fri, Apr 09 2021, Mark Kettenis <[email protected]> wrote:
>> Date: Fri, 9 Apr 2021 07:09:06 +1000
>> From: Jonathan Matthew <[email protected]>
>>
>> On Thu, Apr 08, 2021 at 10:24:06AM +0200, Martin Pieuchot wrote:
>> > firefox often crash when somebody else connects to the jitsi I'm in.
>> > The trace looks like a stack exhaustion, see below.
>> >
>> > Does this ring a bell?
>> >
>> > #530 <signal handler called>
>> > #531 pthread_setschedparam (thread=0x0, policy=1, param=0x2ade5ca1df0)
>> > at /usr/src/lib/librthread/rthread_sched.c:56
>> > #532 0x000002ae65f9f016 in
>> > rtc::PlatformThread::SetPriority(rtc::ThreadPriority) () from
>> > /usr/local/lib/firefox/libxul.so.101.0
>> > #533 0x000002ae65f9ed75 in rtc::PlatformThread::Run() ()
>> > from /usr/local/lib/firefox/libxul.so.101.0
>> > #534 0x000002ae65f9ead9 in rtc::PlatformThread::StartThread(void*) ()
>> > from /usr/local/lib/firefox/libxul.so.101.0
>> > #535 0x000002ade9d4df51 in _rthread_start (v=<optimized out>)
>> > at /usr/src/lib/librthread/rthread.c:96
>> > #536 0x000002ad9c2ec3da in __tfork_thread ()
>> > at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84
>>
>> This looks like, at least with our pthreads implementation, libwebrtc has a
>> race between the newly created thread and the thread creating it.
>>
>> The creating thread stores the thread handle in thread_ here:
>> https://github.com/mozilla/gecko-dev/blob/master/third_party/libwebrtc/webrtc/rtc_base/platform_thread.cc#L186
>>
>> and the new thread uses it here:
>> https://github.com/mozilla/gecko-dev/blob/master/third_party/libwebrtc/webrtc/rtc_base/platform_thread.cc#L363
>> which is called as almost the first thing the new thread does.
>>
>> Our pthread_create() only stores the pthread handle into the supplied
>> address after __tfork_thread() returns, by which time the new thread
>> could already be running.
>
> Right. POSIX isn't very explicit about this, but it does state that
> the thread ID is stored upon successful completion. So I don't think
> portable code can rely on that until pthread_create() has returned.
>
> Using pthread_self() in the pthread_setschedparam() call will probably
> help. But if PlatformThread::GetThreadRef() gets called early on in
> the thread, there will still be a race. Setting thread_ at the start of
> PlatformThread::Run() would be better.
I didn't have much time to read the firefox code and rebuild it, but
fwiw the librthread diff below fixes the issue for me (diff implemented
following jmatthew's analysis).
Test setup: 2 firefox clients in a jitsi conference + another client
connecting in a loop.
Test results: 8/9 crashes with the unpatched librthread, 0 with the
patched librthread.
The benefit of being lenient here is that we don't have to fix other
software that could depend on this type of race to work properly.
And of course there's no need to rebuild firefox, which I heard is
almost a valid technical argument these days. :)
But I agree that the "proper" fix would be in the firefox codebase, and
Mark's proposal makes sense to me even without reading the code.
Not asking for oks. Which way do you folks prefer to go?
Index: rthread.c
===================================================================
RCS file: /d/cvs/src/lib/librthread/rthread.c,v
retrieving revision 1.99
diff -u -p -p -u -r1.99 rthread.c
--- rthread.c 4 Nov 2017 22:53:57 -0000 1.99
+++ rthread.c 19 Apr 2021 11:53:39 -0000
@@ -391,15 +391,15 @@ pthread_create(pthread_t *threadp, const
LIST_INSERT_HEAD(&_thread_list, thread, threads);
_spinunlock(&_thread_lock);
+ *threadp = thread;
/* we're going to be multi-threaded real soon now */
__isthreaded = 1;
rc = __tfork_thread(¶m, sizeof(param), _rthread_start, thread);
if (rc != -1) {
/* success */
- *threadp = thread;
return (0);
}
-
+
rc = errno;
_spinlock(&_thread_lock);
@@ -408,6 +408,11 @@ pthread_create(pthread_t *threadp, const
_rthread_free_stack(thread->stack);
fail1:
_dl_free_tib(tib, sizeof(*thread));
+
+ /*
+ * XXX let *threadp point to garbage? Should we reset it to NULL
+ * or its previous value ?
+ */
return (rc);
}
--
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE