On Mon, Apr 19 2021, Jeremie Courreges-Anglas <[email protected]> wrote:
> On Fri, Apr 09 2021, Mark Kettenis <[email protected]> wrote:
>>> Date: Fri, 9 Apr 2021 07:09:06 +1000
>>> From: Jonathan Matthew <[email protected]>
>>>
>>> On Thu, Apr 08, 2021 at 10:24:06AM +0200, Martin Pieuchot wrote:
>>> > firefox often crash when somebody else connects to the jitsi I'm in.
>>> > The trace looks like a stack exhaustion, see below.
>>> >
>>> > Does this ring a bell?
>>> >
>>> > #530 <signal handler called>
>>> > #531 pthread_setschedparam (thread=0x0, policy=1, param=0x2ade5ca1df0)
>>> > at /usr/src/lib/librthread/rthread_sched.c:56
>>> > #532 0x000002ae65f9f016 in
>>> > rtc::PlatformThread::SetPriority(rtc::ThreadPriority) () from
>>> > /usr/local/lib/firefox/libxul.so.101.0
>>> > #533 0x000002ae65f9ed75 in rtc::PlatformThread::Run() ()
>>> > from /usr/local/lib/firefox/libxul.so.101.0
>>> > #534 0x000002ae65f9ead9 in rtc::PlatformThread::StartThread(void*) ()
>>> > from /usr/local/lib/firefox/libxul.so.101.0
>>> > #535 0x000002ade9d4df51 in _rthread_start (v=<optimized out>)
>>> > at /usr/src/lib/librthread/rthread.c:96
>>> > #536 0x000002ad9c2ec3da in __tfork_thread ()
>>> > at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84
>>>
>>> This looks like, at least with our pthreads implementation, libwebrtc has a
>>> race between the newly created thread and the thread creating it.
>>>
>>> The creating thread stores the thread handle in thread_ here:
>>> https://github.com/mozilla/gecko-dev/blob/master/third_party/libwebrtc/webrtc/rtc_base/platform_thread.cc#L186
>>>
>>> and the new thread uses it here:
>>> https://github.com/mozilla/gecko-dev/blob/master/third_party/libwebrtc/webrtc/rtc_base/platform_thread.cc#L363
>>> which is called as almost the first thing the new thread does.
>>>
>>> Our pthread_create() only stores the pthread handle into the supplied
>>> address after __tfork_thread() returns, by which time the new thread
>>> could already be running.
>>
>> Right. POSIX isn't very explicit about this, but it does state that
>> the thread ID is stored upon successful completion. So I don't think
>> portable code can rely on that until pthread_create() has returned.
>>
>> Using pthread_self() in the pthread_setschedparam() call will probably
>> help. But if PlatformThread::GetThreadRef() gets called early on in
>> the thread, there will still be a race. Setting thread_ at the start of
>> PlatformThread::Run() would be better.
>
> I didn't have much time to read the firefox code and rebuild it, but
> fwiw the librthread diff below fixes the issue for me (diff implemented
> following jmatthew's analysis).
>
> Test setup: 2 firefox clients in a jitsi conference + another client
> connecting in a loop.
> Test results: 8/9 crashes with the unpatched librthread, 0 with the
> patched librthread.
Missing data: that was with the third client connection 50 times to the
conference.
> The benefit of being lenient here is that we don't have to fix other
> software that could depend on this type of race to work properly.
> And of course there's no need to rebuild firefox, which I heard is
> almost a valid technical argument these days. :)
>
> But I agree that the "proper" fix would be in the firefox codebase, and
> Mark's proposal makes sense to me even without reading the code.
>
> Not asking for oks. Which way do you folks prefer to go?
>
> Index: rthread.c
> ===================================================================
> RCS file: /d/cvs/src/lib/librthread/rthread.c,v
> retrieving revision 1.99
> diff -u -p -p -u -r1.99 rthread.c
> --- rthread.c 4 Nov 2017 22:53:57 -0000 1.99
> +++ rthread.c 19 Apr 2021 11:53:39 -0000
> @@ -391,15 +391,15 @@ pthread_create(pthread_t *threadp, const
> LIST_INSERT_HEAD(&_thread_list, thread, threads);
> _spinunlock(&_thread_lock);
>
> + *threadp = thread;
> /* we're going to be multi-threaded real soon now */
> __isthreaded = 1;
> rc = __tfork_thread(¶m, sizeof(param), _rthread_start, thread);
> if (rc != -1) {
> /* success */
> - *threadp = thread;
> return (0);
> }
> -
> +
> rc = errno;
>
> _spinlock(&_thread_lock);
> @@ -408,6 +408,11 @@ pthread_create(pthread_t *threadp, const
> _rthread_free_stack(thread->stack);
> fail1:
> _dl_free_tib(tib, sizeof(*thread));
> +
> + /*
> + * XXX let *threadp point to garbage? Should we reset it to NULL
> + * or its previous value ?
> + */
>
> return (rc);
> }
--
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE