Re: sysctl kern.ipc.somaxconn limit 65535 why?

Chuck Swiger Wed, 04 Jan 2012 17:32:09 -0800

On Jan 4, 2012, at 4:09 PM, Dan The Man wrote:
>>> With the new IBM developments underway of 16 core atom processors and 
>>> hundreds of gigabytes of memory, surely a backlog of 100k is manageable. Or 
>>> what about the future of 500 core systems with a terrabyte of memory, 100k 
>>> listen queue could be processed instantly.
>> 
>> Um.  I gather you don't have much background in operating system design or 
>> massively parallelized systems?
>> 
>> Due to locking constraints imposed by whatever synchronization mechanism and 
>> communications topology is employed between cores, you simply cannot just 
>> add more processors to a system and expect it to go faster in a linear 
>> fashion.  Having 500 cores contending over a single queue is almost certain 
>> to result in horrible performance.  Even though the problem of a bunch of 
>> independent requests is "embarrassingly parallelizeable", you do that by 
>> partitioning the queue into multiple pieces that are fed to different groups 
>> or pools of processors to minimize contention over a single data structure.
>> 
> 
> I guess your calling me out to talk about what I'm doing based on that 
> statement:


"calling you out" is the wrong notion-- I don't see much purpose in arguing 
about opinions.

On the other hand, I do criticize the notion that simply adding more cores will 
mean something goes faster.  If you're going to design or work on parallel 
systems, you've got to understand at least a little about synchronization 
overhead and communication latency, and the fact that there will come a point 
where adding more processors just results in more overhead rather than in more 
work being completed.

> First framework I was working on a few weeks back just had a parent bind 
> socket, then spawn a certain amount of children to do the accept on the 
> socket, so parent could just focus on dealing with SIGCHLD and what not. I 
> had issues with this design for some reason, all the sockets were set to 
> non-blocking etc, and using kqueue to monitor the socket, but randomly I 
> would have a 1-2 second delay at times from a child doing an accept, I was 
> horrified and changed design quickly.

Non-blocking?  You aren't/weren't sitting and spinning in the children, are you?

> New design, parent does all the accepts and passes blocking work to children 
> via socketpairs it created when forking. Now you talk about scaling on 
> multiple cores, well each child could have its own core to do its blocking 
> I/O on and each have its own processor time, which isn't parallism , but I 
> never said it was doing that.

Um.  You're the one who brought up the notion of "500 core systems".

The preforking worker pool model is a classic example from parallel computing.

> The better part of this design is you have 1 process utilizing a processor 
> efficiently instead of paging the system with useless processes. Also could 
> could have other machines connect in to parent and it could do same thing it 
> does with children via a socket, so in my opinion its more scalable and can 
> centralize everything in one spot.  Obviously some cons to this design, you 
> are passing data via socket pairs instead of child writing directly to client.

Typical worker pools (ie, Apache httpd's) tend to block on resources like disk 
I/O for static resources, or talking to a database or some other service over 
the network for dynamic responses (mod_perl, mod_php, etc).  It's very common 
to run hundreds of httpd children on a machine that might not have even two 
cores.

If you have worker processes that tend to be CPU-bound, limiting the size of 
your worker pool to one process per available CPU might be reasonable.

> To stress test this new design I simply wrote an asycronouse client 
> counterpart to create 100k of connections to parents listen queue, then it 
> would go off writing to each socket, of course soon as I reached 60k or so 
> client would get numerous failures due to OS limits. So my intention was to 
> see how long it would take children to process request and send response back 
> to client, starting from listen queue with 100k of fd's ready to go I thought 
> would have been really nice test not only for testing applications speed but 
> also testing cpu usage, I/O usage etc with parent processing a client trying 
> to talk to it 100k times at once to really see how kqueue does.
> 
> Without being able to increase simple limits like these how ever going to 
> find where we can burn down the system and make it outperform epoll() one day.

I'm having difficulty parsing some of that.

If you want to understand the performance of the system, benchmarking it under 
normal conditions and any expected abnormal conditions makes sense.  But you 
were attempting to test with a massive backlog load scenario which isn't even 
possible with FreeBSD at the present time, as you discovered.  A useful 
benchmark would look at maximum throughput under load, and service time, and 
such, and you could readily compare the performance you see on FreeBSD versus 
whatever your epoll() implementation runs on.

> What it so bad to see how many fd's I could toss at kqueue before it croaked? 
> @60k was still handling like a champ with about 50 children getting handed 
> work in my tests.


Figure out the request capacity for your system.  Take a look at the service 
SLA.  Multiply them.
The result is the largest listen queue value that makes sense to use.

If you have a system which can do 100 requests per second, and it needs to 
return an answer to a given request in ten seconds, then setting the listen 
queue size to anything over 1000 is not just pointless, but counterproductive, 
because the answers will come too late to meet the SLA.  If the system faces a 
backlog that will take longer than 10 seconds to answer, then it needs to start 
dropping requests rather than continue to queue up more requests than it can 
handle in a sufficiently timely fashion.  *And* the side making requests really 
ought to recognize the overload condition, and mitigate against it.  See RFC 
2914.

Regards,
-- 
-Chuck

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: sysctl kern.ipc.somaxconn limit 65535 why?

Reply via email to