On 10/21/19 4:42 PM, William Brown wrote:
On 22 Oct 2019, at 07:28, Mark Reynolds <[email protected]> wrote:
On 10/9/19 10:00 PM, William Brown wrote:
On 9 Oct 2019, at 19:55, Ludwig Krispenz <[email protected]> wrote:
Hi William,
I like your radical approach :-)
In my opinion our connection code is getting to complicated by maintaining two
different implementations in parallel - not separated, but intermangled (and
even more complicated by turbo mode). So I agree we should have only one, but
which one ? In my opinion nunc stans is the theoretically better approach, but
nobody is confident enough to rely on nunc stans alone. The conntable mode has
its problems (especially if handling many concurrent connections, and worse if
they are established almost at the same time)(otherwise we would not have
experimented with nunc stans), but is stable and for most of the use cases
efficient enough.
I think you nailed it in one - we aren't confident in nunc-stans today, so let's keep
what works and improve that. There are already many similar concepts - work queues,
threads, even slapi_conn_t. I think that it would be possible to bring "nunc-stans
ideas" into reworking and improvement to the current connection code instead.
So reducing the complexity by removing nunc stans (and maybe also turbo mode)
and then do cleanup and try to improve the bottlenecks would be an acceptable
approach to me.
Agree. It also means we can make much smaller changes in an easier to control
and test fashion I think.
In my opinion the core of the problem of the "old" connection code is that the
main thread is handling new connections and already established connections and so does
iterate over the connection table. Using an event model looks like the best way to handle
this, but if it doesn't work we need to look for other improvements without breaking
things.
Your suggestion to make the conn table data structure more lean and flexible is
one option. In sun ds, when I didn't know about event queues I did split the
main thread, one handling new connections and multiple to handle established
connections (parts of teh conn table) - reusing the existing mechanisms, just
splitting the load. Maybe we can also think in this direction.
I think so too. We can certainly have some ideas about what actually does the
polling vs what does accepting, or better, event management etc. There are some
ideas to have smaller groups of workers too to improve thread locality and help
improve concurrency too.
So maybe I'll put together a patch to remove nunc-stans soon then, and start to
look at the existing connection code and options to improve that + some
profiling.
I still would like to hear about my original question though as quoted below, I
think Mark might have some comments :)
Why nunc-stans? Because at the time we were terrible at handling many connections,
simultaneously and in large numbers. C10K, etc. Our competitors performed much better
in these scenarios than we did. I recall several customer cases complaining about
"our performance verses theirs" in regards to large numbers of connection.
So, moving forward, I agree with everyone here. We should remove nunc-stans,
but as William said try to incorporate some of its concepts into our connection
code (eventually). We should clean up the connection code as much as possible,
and remove turbo mode if it does not provide much value.
I think turbo mode was to try and shortcut returning to the conntable and then having the
blocking on the connections poll because the locking strategies before weren't as good. I
think there is still some value in turbo "for now" but if we can bring in
libevent, then it diminishes because we become event driven rather than poll driven.
"turbo mode" means "keep reading from this socket as quickly as possible
until you get EAGAIN/EWOULDBLOCK" i.e. keep reading from the socket as
fast as possible as long as there is data immediately available. This
is very useful for replication consumers, especially during online init,
when the supplier is feeding you data as fast as possible. Otherwise,
its usefulness is limited to applications where you have a single client
hammering you with requests, of which test/stress clients form a
significant percentage.
The one thing that has been frustrating is how the connection code had become
very complicated and most of us no longer know how it works anymore. It would
be nice to get it to a state that is much more maintainable (especially for
engineers new to the code).
Given how much I've looked at it recently, I'm probably the closest to having
an understanding of that code, but I certainly won't claim 100% expertise here.
I think we should look at improving the connection table as previously
suggested, and we should add the additional polling thread that Ludwig
mentioned. We know we will get added performance with just these two changes.
Then we should stress and evaluate the new behavior, and if need be we can look
into more invasive/architectural changes and use some of the ideology from
nunc-stans. (This also means we need a way to properly and consistently test
high connection load from multiple clients).
I think there are a few things we can do beside this. I think we could also
split the thread pool into multiple worker pools, each with a polling thread.
IE instead of say 32 threads in one pool, have two pools of 16 threads each,
and then just load-balance new connections into each. This will help with cache
locality and more. A quick win in the short term is a freelist of available
conntable slots so we aren't walking the table on allocation too.
I'm reviving my old lib389 load testing code so I can make some load tests that
will spit out csvs for us to check between runs so that we have some better
profiling data too.
Anyway, I think we're all in agreement here, so we have a plan. Time for me to
do some work then ....
Mark
The main question is *why* do we want it merged?
Is it performance? Recently I provided a patch that yielded an approximate ~30%
speed up in the entire server through put just by changing our existing
connection code.
Is it features? What features are we wanting from this? We have no complaints
about our current threading model and thread allocations.
Is it maximum number of connections? We can always change the conntable to a
better datastructure that would help scale this number higher (which would also
yield a performance gain).
—
Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server
SUSE Labs
_______________________________________________
389-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/[email protected]
--
389 Directory Server Development Team
—
Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server
SUSE Labs
_______________________________________________
389-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/[email protected]
_______________________________________________
389-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/[email protected]