On 10/9/19 10:00 PM, William Brown wrote:
On 9 Oct 2019, at 19:55, Ludwig Krispenz <[email protected]> wrote:

Hi William,

I like your radical approach :-)

In my opinion our connection code is getting to complicated by maintaining two 
different implementations in parallel -  not separated, but intermangled (and 
even more complicated by turbo mode). So I agree we should have only one, but 
which one ? In my opinion nunc stans is the theoretically better approach, but 
nobody is confident enough to rely on nunc stans alone. The conntable mode has 
its problems (especially if handling many concurrent connections, and worse if 
they are established almost at the same time)(otherwise we would not have 
experimented with nunc stans), but is stable and for most of the use cases 
efficient enough.
I think you nailed it in one - we aren't confident in nunc-stans today, so let's keep 
what works and improve that. There are already many similar concepts - work queues, 
threads, even slapi_conn_t. I think that it would be possible to bring "nunc-stans 
ideas" into reworking and improvement to the current connection code instead.

So reducing the complexity by removing nunc stans (and maybe also turbo mode) 
and then do cleanup and try to improve the bottlenecks would be an acceptable 
approach to me.
Agree. It also means we can make much smaller changes in an easier to control 
and test fashion I think.

In my opinion the core of the problem of the "old" connection code is that the 
main thread is handling new connections and already established connections and so does 
iterate over the connection table. Using an event model looks like the best way to handle 
this, but if it doesn't work we need to look for other improvements without breaking 
things.
Your suggestion to make the conn table data structure more lean and flexible is 
one option. In sun ds, when I didn't know about event queues I did split the 
main thread, one handling new connections and multiple to handle established 
connections (parts of teh conn table) - reusing the existing mechanisms, just 
splitting the load. Maybe we can also think in this direction.
I think so too. We can certainly have some ideas about what actually does the 
polling vs what does accepting, or better, event management etc. There are some 
ideas to have smaller groups of workers too to improve thread locality and help 
improve concurrency too.

So maybe I'll put together a patch to remove nunc-stans soon then, and start to 
look at the existing connection code and options to improve that + some 
profiling.

I still would like to hear about my original question though as quoted below, I 
think Mark might have some comments :)

Why nunc-stans?  Because at the time we were terrible at handling many connections, simultaneously and in large numbers.  C10K, etc.  Our competitors performed much better in these scenarios than we did.  I recall several customer cases complaining about "our performance verses theirs" in regards to large numbers of connection.

So, moving forward, I agree with everyone here.  We should remove nunc-stans, but as William said try to incorporate some of its concepts into our connection code (eventually).  We should clean up the connection code as much as possible, and remove turbo mode if it does not provide much value.  The one thing that has been frustrating is how the connection code had become very complicated and most of us no longer know how it works anymore.  It would be nice to get it to a state that is much more maintainable (especially for engineers new to the code).

I think we should look at improving the connection table as previously suggested, and we should add the additional polling thread that Ludwig mentioned.  We know we will get added performance with just these two changes.  Then we should stress and evaluate the new behavior, and if need be we can look into more invasive/architectural changes and use some of the ideology from nunc-stans.  (This also means we need a way to properly and consistently test high connection load from multiple clients).

Mark


The main question is *why* do we want it merged?
Is it performance? Recently I provided a patch that yielded an approximate ~30% 
speed up in the entire server through put just by changing our existing 
connection code.
Is it features? What features are we wanting from this? We have no complaints 
about our current threading model and thread allocations.
Is it maximum number of connections? We can always change the conntable to a 
better datastructure that would help scale this number higher (which would also 
yield a performance gain).
—
Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server
SUSE Labs
_______________________________________________
389-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]

--

389 Directory Server Development Team
_______________________________________________
389-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]

Reply via email to