> On 9 Oct 2019, at 09:18, Rich Megginson <[email protected]> wrote:
> 
> On 10/8/19 4:55 PM, William Brown wrote:
>> Hi everyone,
>> In our previous catch up (about 4/5 weeks ago when I was visiting 
>> Matus/Simon), we talked about nunc-stans and getting it at least cleaned up 
>> and into the code base.
>> I've been looking at it again, and really thinking about it and reflecting 
>> on it and I have a lot of questions and ideas now.
>> The main question is *why* do we want it merged?
>> Is it performance? Recently I provided a patch that yielded an approximate 
>> ~30% speed up in the entire server through put just by changing our existing 
>> connection code.
>> Is it features? What features are we wanting from this? We have no 
>> complaints about our current threading model and thread allocations.
>> Is it maximum number of connections? We can always change the conntable to a 
>> better datastructure that would help scale this number higher (which would 
>> also yield a performance gain).
> 
> It is mostly about the c10k problem, trying to figure out a way to use epoll, 
> via an event framework like libevent, libev, or libtevent, but in a 
> multi-threaded way (at the time none of those were really thread safe, or 
> suitable for use in the way we do multi-threading in 389).
> 
> It wasn't about performance, although I hoped that using lock-free data 
> structures might solve some of the performance issues around thread 
> contention, and perhaps using a "proper" event framework might give us some 
> performance boost e.g. the idle thread processing using libevent timeouts.  I 
> think that using poll() is never going to scale as well as epoll() in some 
> cases e.g. lots of concurrent connections, no matter what sort of 
> datastructure you use for the conntable.

The conntable was bottlenecking because when you had:

| conn | conn | freeslot | ....

it would attempt to lock each conn to see if it was free or not. This meant if 
a conn was in an io, we would block waiting for it to finish before we could 
move to the next conn to check if it was free. After changing to pthread, we 
can now do trylock, where if trylock fails it can be implied the conn slot must 
be in use, so skip it. This is how we got the 30% speed up recently (my laptop 
went from about 4200 conn/sec to over 6000).

However the conntable exists to allow the conn struct to be re-used over and 
over. There are multiple ways we could solve this. On conn free, we could 
append to a freelist to prevent iterating over the list to find a slot. We 
could use a btreemap and just alloc/dealloc conns as needed to prevent the need 
to walk and find conns (this would help sanitizers with memory too). We could 
bring in libevent directly to the main server and have it replace poll too. 

And even from then, I think we should be using flamegraphs to work out where 
our time limits are too. 

> 
> As far as features goes - it would be nice to give plugins the ability to 
> inject event requests, get timeout events, using the same framework as the 
> main server engine.
> 
>> The more I have looked at the code, I guess with time and experience, the 
>> more hesitant I am to actually commit to merging it. It was designed by 
>> people who did not understand low-level concurrency issues and memory 
>> architectures of systems,
> 
> I resemble that remark.  I suppose you could "turn off" the lock-free code 
> and use mutexes.

It wasn't meant to be an insult btw - when I first looked I didn't understand 
either, so I resemble this remark too. There are lots of my own mistakes all 
over that code. 

We have changed to mutexes since, but even with those the job model and how 
they queue / dequeue and work along with libevent still has issues because you 
need to protect the items in certain states basically. Having a job self-rearm 
was a nightmare problem. We had to add a monitor on each ns_job_t to allow it 
to work correctly too because of the call back nature of the code. 

When it was initially added there was a job per io event, but now it's a job 
per connection and it lives the life of the connection. This has lead to the 
current issue where there is a 1:1 of slapi_conn_t and ns_job_t, and each has a 
lock and that led to deadlocks ... 

> 
>> so it's had a huge number of (difficult and subtle) unsafety issues. And 
>> while most of those are fixed, what it does is duplicating the connection 
>> structure from core 389,
> 
> It was supposed to eventually replace the connection code.

There is too much in the slapi_conn_t structure that can't be removed though, 
and too much locking of that struct through out the codebase of it. It's 
probably better to try to update slapi_conn_t to have new ideas than trying to 
replace it.

> 
>> leading to weird solutions like lock sharing and having to use monitors and 
>> more. We've tried a few times to push forward with this, but each time we 
>> end up with a lot of complexity and fragility.
>> So I'm currently thinking a better idea is to step back, re-evaluate what 
>> the problem is we are trying to solve for, then to solve *that*.
>> The question now is "what is the concern that ns would solve". From knowing 
>> that, then we can make a plan and approach it more constructively I think.
> 
> I agree.  There are probably better ways to solve the problems now.

Maybe even just smaller steps rather than one big-bang replacement :) 

> 
>> At the end of the day, I'm questioning if we should just rm -r 
>> src/nunc-stans and rethink this whole approach - there are just too many 
>> architectural flaws and limitations in ns that are causing us headaches.
>> Ideas and thoughts?
>> --
>> Sincerely,
>> William
>> _______________________________________________
>> 389-devel mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> Fedora Code of Conduct: 
>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>> List Archives: 
>> https://lists.fedoraproject.org/archives/list/[email protected]
> _______________________________________________
> 389-devel mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/[email protected]

—
Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server
SUSE Labs
_______________________________________________
389-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]

Reply via email to