Dear LUG, Been a long while since I initiated a mail to LUG.
I thought I will discuss different network I/O schemes or rather the way CPU and memory is used to perform what most of us do today. Be it google search or facebook or mail or whatever this is what is key. Right from publishing SSLC results to Deepavali mails to some event like earthquake jamming communications, the ability to scale with load and the load handling capability of servers is put to test. I am an expert in server side gear. I have been doing this the last 10 years and more and starting from OpenBSD kernel, IPsec, routing to Apache to web development I have seen quite a bit of programming techniques. It is a no brainer that if you want performance you write in C. You may know Linus Torvald's comments on C++. Okay I am not really concerned about programming language right now. Only the high level design and coding will just follow. Broadly put there are a few ways by which multi processing is done. 1) fork() based process creation(Apache pre forked model) 2) multi threading 3) event based I/O loop handling You should read the paper by Ousterhout, the author of Tcl about why threading is a bad idea and event loops are better. This has been repeatedly demonstrated not only in my real life experience but also by the success of Twisted Python or nginx server. Apache used only two approaches, 1.x pre forked model and 2.x threading. nginx did better through event based I/O. A lot of UNIX Programmers do not know about select(2). It is the easiest way to handle multiple sockets, be it STDIN, TCP or UDP. You could also use poll(2). Nowadays I just use poll.It is better than select. You also have many methods to monitor file I/O. FreeBSD uses kqueue(2) and Linux has inotify and friends. All UNICes have event(3) library which is a generic library for network and file I/O using the event loop model. If you have stayed with me so far then you can also perhaps guess why a straight forward light weight process(thread) cannot scale or improve. It most of the time causes bugs and performance loss. Reason is simple. By having multiple network connections you get performance. This is how download accelerators work. But using pipelining also improves performance which is not using multiple connections. How to resolve this paradox? Multiple network connections get you the bandwidth that are shared on the LAN, so you can use the WAN bandwidth to your advantage. However in protocols like SMTP and HTTP, creating a separate connection for every mail or web request is a drag on performance. Enough for now. -Girish -- G3 Tech Networking appliance company web: http://g3tech.in mail: [email protected] _______________________________________________ ILUGC Mailing List: http://www.ae.iitm.ac.in/mailman/listinfo/ilugc
