> On 22 Oct 2019, at 07:28, Mark Reynolds <[email protected]> wrote: > > > On 10/9/19 10:00 PM, William Brown wrote: >>> On 9 Oct 2019, at 19:55, Ludwig Krispenz <[email protected]> wrote: >>> >>> Hi William, >>> >>> I like your radical approach :-) >>> >>> In my opinion our connection code is getting to complicated by maintaining >>> two different implementations in parallel - not separated, but >>> intermangled (and even more complicated by turbo mode). So I agree we >>> should have only one, but which one ? In my opinion nunc stans is the >>> theoretically better approach, but nobody is confident enough to rely on >>> nunc stans alone. The conntable mode has its problems (especially if >>> handling many concurrent connections, and worse if they are established >>> almost at the same time)(otherwise we would not have experimented with nunc >>> stans), but is stable and for most of the use cases efficient enough. >> I think you nailed it in one - we aren't confident in nunc-stans today, so >> let's keep what works and improve that. There are already many similar >> concepts - work queues, threads, even slapi_conn_t. I think that it would be >> possible to bring "nunc-stans ideas" into reworking and improvement to the >> current connection code instead. >> >>> So reducing the complexity by removing nunc stans (and maybe also turbo >>> mode) and then do cleanup and try to improve the bottlenecks would be an >>> acceptable approach to me. >> Agree. It also means we can make much smaller changes in an easier to >> control and test fashion I think. >> >>> In my opinion the core of the problem of the "old" connection code is that >>> the main thread is handling new connections and already established >>> connections and so does iterate over the connection table. Using an event >>> model looks like the best way to handle this, but if it doesn't work we >>> need to look for other improvements without breaking things. >>> Your suggestion to make the conn table data structure more lean and >>> flexible is one option. In sun ds, when I didn't know about event queues I >>> did split the main thread, one handling new connections and multiple to >>> handle established connections (parts of teh conn table) - reusing the >>> existing mechanisms, just splitting the load. Maybe we can also think in >>> this direction. >> I think so too. We can certainly have some ideas about what actually does >> the polling vs what does accepting, or better, event management etc. There >> are some ideas to have smaller groups of workers too to improve thread >> locality and help improve concurrency too. >> >> So maybe I'll put together a patch to remove nunc-stans soon then, and start >> to look at the existing connection code and options to improve that + some >> profiling. >> >> I still would like to hear about my original question though as quoted >> below, I think Mark might have some comments :) > > Why nunc-stans? Because at the time we were terrible at handling many > connections, simultaneously and in large numbers. C10K, etc. Our > competitors performed much better in these scenarios than we did. I recall > several customer cases complaining about "our performance verses theirs" in > regards to large numbers of connection. > > So, moving forward, I agree with everyone here. We should remove nunc-stans, > but as William said try to incorporate some of its concepts into our > connection code (eventually). We should clean up the connection code as much > as possible, and remove turbo mode if it does not provide much value.
I think turbo mode was to try and shortcut returning to the conntable and then having the blocking on the connections poll because the locking strategies before weren't as good. I think there is still some value in turbo "for now" but if we can bring in libevent, then it diminishes because we become event driven rather than poll driven. > The one thing that has been frustrating is how the connection code had > become very complicated and most of us no longer know how it works anymore. > It would be nice to get it to a state that is much more maintainable > (especially for engineers new to the code). Given how much I've looked at it recently, I'm probably the closest to having an understanding of that code, but I certainly won't claim 100% expertise here. > > I think we should look at improving the connection table as previously > suggested, and we should add the additional polling thread that Ludwig > mentioned. We know we will get added performance with just these two > changes. Then we should stress and evaluate the new behavior, and if need be > we can look into more invasive/architectural changes and use some of the > ideology from nunc-stans. (This also means we need a way to properly and > consistently test high connection load from multiple clients). I think there are a few things we can do beside this. I think we could also split the thread pool into multiple worker pools, each with a polling thread. IE instead of say 32 threads in one pool, have two pools of 16 threads each, and then just load-balance new connections into each. This will help with cache locality and more. A quick win in the short term is a freelist of available conntable slots so we aren't walking the table on allocation too. I'm reviving my old lib389 load testing code so I can make some load tests that will spit out csvs for us to check between runs so that we have some better profiling data too. Anyway, I think we're all in agreement here, so we have a plan. Time for me to do some work then .... > > Mark > >> >>>> The main question is *why* do we want it merged? >>>> Is it performance? Recently I provided a patch that yielded an approximate >>>> ~30% speed up in the entire server through put just by changing our >>>> existing connection code. >>>> Is it features? What features are we wanting from this? We have no >>>> complaints about our current threading model and thread allocations. >>>> Is it maximum number of connections? We can always change the conntable to >>>> a better datastructure that would help scale this number higher (which >>>> would also yield a performance gain). >> — >> Sincerely, >> >> William Brown >> >> Senior Software Engineer, 389 Directory Server >> SUSE Labs >> _______________________________________________ >> 389-devel mailing list -- [email protected] >> To unsubscribe send an email to [email protected] >> Fedora Code of Conduct: >> https://docs.fedoraproject.org/en-US/project/code-of-conduct/ >> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines >> List Archives: >> https://lists.fedoraproject.org/archives/list/[email protected] > > -- > > 389 Directory Server Development Team — Sincerely, William Brown Senior Software Engineer, 389 Directory Server SUSE Labs _______________________________________________ 389-devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/[email protected]
