> On 22 Oct 2019, at 07:28, Mark Reynolds <[email protected]> wrote:
> 
> 
> On 10/9/19 10:00 PM, William Brown wrote:
>>> On 9 Oct 2019, at 19:55, Ludwig Krispenz <[email protected]> wrote:
>>> 
>>> Hi William,
>>> 
>>> I like your radical approach :-)
>>> 
>>> In my opinion our connection code is getting to complicated by maintaining 
>>> two different implementations in parallel -  not separated, but 
>>> intermangled (and even more complicated by turbo mode). So I agree we 
>>> should have only one, but which one ? In my opinion nunc stans is the 
>>> theoretically better approach, but nobody is confident enough to rely on 
>>> nunc stans alone. The conntable mode has its problems (especially if 
>>> handling many concurrent connections, and worse if they are established 
>>> almost at the same time)(otherwise we would not have experimented with nunc 
>>> stans), but is stable and for most of the use cases efficient enough.
>> I think you nailed it in one - we aren't confident in nunc-stans today, so 
>> let's keep what works and improve that. There are already many similar 
>> concepts - work queues, threads, even slapi_conn_t. I think that it would be 
>> possible to bring "nunc-stans ideas" into reworking and improvement to the 
>> current connection code instead.
>> 
>>> So reducing the complexity by removing nunc stans (and maybe also turbo 
>>> mode) and then do cleanup and try to improve the bottlenecks would be an 
>>> acceptable approach to me.
>> Agree. It also means we can make much smaller changes in an easier to 
>> control and test fashion I think.
>> 
>>> In my opinion the core of the problem of the "old" connection code is that 
>>> the main thread is handling new connections and already established 
>>> connections and so does iterate over the connection table. Using an event 
>>> model looks like the best way to handle this, but if it doesn't work we 
>>> need to look for other improvements without breaking things.
>>> Your suggestion to make the conn table data structure more lean and 
>>> flexible is one option. In sun ds, when I didn't know about event queues I 
>>> did split the main thread, one handling new connections and multiple to 
>>> handle established connections (parts of teh conn table) - reusing the 
>>> existing mechanisms, just splitting the load. Maybe we can also think in 
>>> this direction.
>> I think so too. We can certainly have some ideas about what actually does 
>> the polling vs what does accepting, or better, event management etc. There 
>> are some ideas to have smaller groups of workers too to improve thread 
>> locality and help improve concurrency too.
>> 
>> So maybe I'll put together a patch to remove nunc-stans soon then, and start 
>> to look at the existing connection code and options to improve that + some 
>> profiling.
>> 
>> I still would like to hear about my original question though as quoted 
>> below, I think Mark might have some comments :)
> 
> Why nunc-stans?  Because at the time we were terrible at handling many 
> connections, simultaneously and in large numbers.  C10K, etc.  Our 
> competitors performed much better in these scenarios than we did.  I recall 
> several customer cases complaining about "our performance verses theirs" in 
> regards to large numbers of connection.
> 
> So, moving forward, I agree with everyone here.  We should remove nunc-stans, 
> but as William said try to incorporate some of its concepts into our 
> connection code (eventually).  We should clean up the connection code as much 
> as possible, and remove turbo mode if it does not provide much value.

I think turbo mode was to try and shortcut returning to the conntable and then 
having the blocking on the connections poll because the locking strategies 
before weren't as good. I think there is still some value in turbo "for now" 
but if we can bring in libevent, then it diminishes because we become event 
driven rather than poll driven. 

>  The one thing that has been frustrating is how the connection code had 
> become very complicated and most of us no longer know how it works anymore.  
> It would be nice to get it to a state that is much more maintainable 
> (especially for engineers new to the code).

Given how much I've looked at it recently, I'm probably the closest to having 
an understanding of that code, but I certainly won't claim 100% expertise here. 

> 
> I think we should look at improving the connection table as previously 
> suggested, and we should add the additional polling thread that Ludwig 
> mentioned.  We know we will get added performance with just these two 
> changes.  Then we should stress and evaluate the new behavior, and if need be 
> we can look into more invasive/architectural changes and use some of the 
> ideology from nunc-stans.  (This also means we need a way to properly and 
> consistently test high connection load from multiple clients).

I think there are a few things we can do beside this. I think we could also 
split the thread pool into multiple worker pools, each with a polling thread. 
IE instead of say 32 threads in one pool, have two pools of 16 threads each, 
and then just load-balance new connections into each. This will help with cache 
locality and more. A quick win in the short term is a freelist of available 
conntable slots so we aren't walking the table on allocation too.

I'm reviving my old lib389 load testing code so I can make some load tests that 
will spit out csvs for us to check between runs so that we have some better 
profiling data too. 


Anyway, I think we're all in agreement here, so we have a plan. Time for me to 
do some work then .... 

> 
> Mark
> 
>> 
>>>> The main question is *why* do we want it merged?
>>>> Is it performance? Recently I provided a patch that yielded an approximate 
>>>> ~30% speed up in the entire server through put just by changing our 
>>>> existing connection code.
>>>> Is it features? What features are we wanting from this? We have no 
>>>> complaints about our current threading model and thread allocations.
>>>> Is it maximum number of connections? We can always change the conntable to 
>>>> a better datastructure that would help scale this number higher (which 
>>>> would also yield a performance gain).
>> —
>> Sincerely,
>> 
>> William Brown
>> 
>> Senior Software Engineer, 389 Directory Server
>> SUSE Labs
>> _______________________________________________
>> 389-devel mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> Fedora Code of Conduct: 
>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>> List Archives: 
>> https://lists.fedoraproject.org/archives/list/[email protected]
> 
> -- 
> 
> 389 Directory Server Development Team

—
Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server
SUSE Labs
_______________________________________________
389-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]

Reply via email to