On Fri, Apr 20, 2018 at 7:51 PM, Jason Madden
<[email protected]> wrote:
> I apologize up front for the length of this. There's a tl;dr at the end.
>
>> On Apr 20, 2018, at 04:07, Ben Noordhuis <[email protected]> wrote:
>>
>> On Thu, Apr 19, 2018 at 4:08 PM, Jason Madden
>> <[email protected]> wrote:
>>>
>>>
>>> After some debugging it became clear what was going on. The gory
>>> details are in the bug report [2], but in summary this is an
>>> interaction between the facts that the QUEUE that is used to iterate
>>> through the check watchers[3] is stack allocated...
>
>>> Combining switching
>>> greenlets with stopping watchers thus can result in the QUEUE getting
>>> restored to an invalid state (infinite loop in this case).
>>>
>>> We solved this problem by heap allocating the QUEUE[5] on each entry to
>>> uv__run_check so that it no longer got saved and restored by greenlet
>>> switches. This obviously adds some overhead, though I didn't try to
>>> quantify it.
>>>
>>> Potentially a better fix would be to have the QUEUE as part of the
>>> loop structure; I didn't do that here to keep the patch as small as
>>> possible (and I didn't know if that would hurt the chances of it
>>> getting upstreamed due to ABI concerns). Or maybe there are other
>>> approaches that are better? Feedback is welcome!
>>
>> I suppose we could change that in libuv v2 (not v1 because of ABI)
>
> Understood.
>
> Since we can't change the loop structure, and assuming malloc/free are "too
> expensive"[1], what came to mind, something I've implemented in the past for
> a similar case, was a static freelist of objects (QUEUE in this case). It
> could either be thread local or with appropriate intrinsics be implemented
> atomically. Objects would have to be checked out and returned using an API,
> much like malloc/free. I know that's not as easy as stack allocation from a
> correctness standpoint (but see below, that may be necessary for avoiding
> regressions). It looks like the intrinsics may already be there in
> atomic-ops.h (but I'll be honest, it's been years since I implemented
> anything like that).
>
>> I suspect it would break again quickly because there is no good way to
>> regression-test it.
>
> Agreed. I was giving it a little bit of thought, and (aside from simply
> having the style guide say "don't do that",) the only way I had come up with
> to really avoid *needing to regression test it* was to simply make it
> impossible: Make the queue type private and wrap an API around getting and
> returning a pointer to one (again, just like malloc/free, except the API
> could be macros). I'm sure nobody would be a big fan of that, but it would
> have the advantage of being flexible (the loop could keep a QUEUE of QUEUEs,
> popping and pushing as needed, growing a new entry should one be needed
> recursively). That flexibility may not be necessary if there are only a few
> defined places that are "reentrant" like in the original bug, in which case
> still hiding the type and a few defined macros could be a decent approach.
>
>> For instance, there's currently a pull request
>> under review that introduces a stack-allocated QUEUE in
>> uv_write()-related code.
>
> If you are referring to https://github.com/libuv/libuv/pull/1787, I don't
> think that would cause any problems. The new stack allocated variable is not
> reachable outside that scope as far as I can see...but that does highlight
> the point that it's either "never ever have a stack allocated queue" or
> "carefully eyeball each instance, expect to eventually misidentify a case and
> just wait for a downstream to report the bug".
>
> Regarding v1 vs v2 and timelines, here's the way thing look from gevent's
> perspective:
>
> As a gevent maintainer, I don't really mind too bad carrying around a small
> patch to libuv (we did that for quite awhile with libev). However, it limits
> the audience for gevent+libuv. When I'm carrying around a patch for libuv, I
> have to have a custom, embedded, build for it, I can't use the system
> provided library[2]. As a gevent maintainer, I also don't hate that because I
> know what we're getting when I get bug reports, and I also know that I can
> set flags to get link-time-optimizations and the best performance out of the
> Python bindings. However, there are lots of people who install the packaged
> gevent provided by their (Linux) distribution, and there are lots of
> distributions that refuse to allow embedded libraries, for perfectly valid
> reasons, so if gevent can't use the system libuv, it can't use libuv on that
> distro at all. If gevent has significant users that can't use libuv, I can't
> ultimately deprecate and drop libev without leaving those users behind,
> should that moment ever get here.
>
> That's the same reason I can't use libuv v2 until it's released and starts
> getting picked up by distros.
>
> TL;DR:
>
> This is all a bit hypothetical right now. gevent 1.3 is only in beta and
> hasn't been packaged by distros yet AFAIK. Being able to drop my patches for
> gevent 1.3/libuv 1.21 would be cool, though given the complexity of an API-
> and CAS-based approach, that seems unlikely. But if I have to wait for next
> year and gevent 1.4/libuv v2 to build against a system libuv and drop my
> patches, I'm cool with that too.
>
> I'm happy to prepare a simple (ABI-breaking) PR against master/v2 if that
> sounds attractive (although I see there are already outstanding
> QUEUE-changing PRs for master). In light of the measured expense of the
> malloc/free calls, I've already adapted my own patch to do that.
>
> If it doesn't sound attractive right now because of testing and maintenance
> concerns, I understand that too. Like I said, I don't really mind carrying
> around a small patch to libuv, especially not in the early days of
> gevent+libuv. I can circle back sometime in the future if it seems warranted
> either because the patch complexity grows or because of requests from
> distros/users.
>
>
> Thanks again for your thoughts on this,
> Jason
>
> [1]
>
> I decided to quantify the effect of the malloc call on some systems I had
> laying around. I took the uv__run_ code, set up a bunch of handles with a
> no-op callback, compiled with all optimizations enabled, and ran timings for
> various numbers of active handles. Approximate costs of the malloc/free of
> the QUEUE on various systems:
>
> Linux x86_64: 30ns
> FreeBSD 10 x86_64: 35ns
> Solaris 11 x86_64: 60ns
> macOS 10.13.4 w/jemalloc: 60ns
> macOS 10.13.4: 80ns
> Windows 10 x86_64: 80ns
> Linux arm Rasp Pi 2: 500ns
>
> The approximate number of active handles needed to amortize the cost away
> (due to the extra function callbacks) ranged from 10 on the low end to 100 in
> the middle to 1000+ on the high end.
>
> [2]
>
> Actually, right now I can't build with a system libuv either. I was never
> able to get the Python extension to link correctl on Windows, so I'm doing
> what https://github.com/saghul/pyuv does and using setuptools to build libuv
> on all platforms. The plan is to circle back to that when distros ask.
tl;dr appreciated. :-)
I can understand not wanting to carry patches on top of libuv. We
could add a per-event loop flag that instructs libuv to heap-allocate
queues; that could go into libuv v1.
Usage sketch:
uv_loop_t* loop = /* ... */;
uv_loop_configure(loop, UV_LOOP_MALLOC_QUEUES, 1);
And in libuv itself in places where it's needed:
QUEUE* q;
QUEUE q_storage;
q = &q_storage;
if (loop->flags & UV_LOOP_MALLOC_QUEUES)
if (NULL == (q = uv__malloc(sizeof(*q))))
return UV_ENOMEM;
/* ... */
if (q != &q_storage)
uv__free(q);
q = NULL;
It doesn't really address the testing issue and there may be places
where malloc failures cannot be propagated upwards, but it's a start.
--
You received this message because you are subscribed to the Google Groups
"libuv" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/libuv.
For more options, visit https://groups.google.com/d/optout.