Hi Carlo,
On Thu, Aug 04, 2011 at 01:44:10PM -0700, carlo flores wrote:
> Hey Willy, all.
>
> We are playing with Mongrel2 and found this post from Zed interesting and --
> for myself -- surprising: http://sheddingbikes.com/posts/1280829388.html
>
> Have you had a chance to read, think about, or respond to it with regards to
> poll vs epoll in HAProxy versus the number of active and total connections?
No, I didn't know about this article, and I wouldn't trust it too much
considering the tests were performed in the context of a server that
makes haproxy a success :-)
The measures I've done indicate very different results from what is posted
above. The guy seems to completely forget the cost of setting up the polling
list, and processing the results :
- with poll(), you have to rebuild the *whole* fd list upon each pass. If
you have more than a few thousands fds, you're terribly slow ;
- in epoll(), the cost of adding/removing an FD to the polling list requires
a system call. This is expensive. It's not the active vs total ratio which
counts here, it's the frequency at which your FDs switch state because
buffers are alternatively full and empty.
- saying that the cost of epoll() is O(N) while you can and should limit
the number of output response is non-sense.
- each call to the epoll_wait() syscall is more expensive than a call to
poll(), which itself is more expensive than a call to select().
- epoll() supports an event-triggered mechanism which saves you from having
to add/remove FDs. Haproxy currently doesn't use it because it was not
available in the old implementation it started with. Also it requires an
fd cache that we now have with sepoll. I'm considering adding it in the
future, or possibly changing sepoll to use epoll_et().
I know for sure that epoll wins over poll with large numbers of fds. When
some production servers runs at low CPU usage with more than 100k fds, it's
a fact that it's not even possible with poll. I've experienced it in the
past, and CPU was at 100% well before reaching these loads, simply because
the poll list had to be rebuilt tens of thousands of times per second, and
all the CPU was spent doing that instead of doing useful work. That's the
reason why I think this guy's tests were incorrect.
Epoll is not easy to efficiently deal with, but you have to build your soft
around it for it to be very efficient, not to use it as a drop-in for poll(),
which I'm sure this guy did. Most likely his tests were done with an epoll
loop which scans the whole fd list and performs the epoll_ctl(ADD/REMOVE),
which totally voids the point of the O(1) polling.
In short, what I'd recommend is the following :
- if you have very few FDs to poll (say less than 100), use select(),
it always wins ;
- if you have less than a few thousands fds to poll, use poll().
- if you have more than a few thousands fds to poll, use epoll.
- if your fds change status very often, use at least a cache and if
possible the event-triggered epoll interface.
Note that FreeBSD's kqueue interface is much nicer than the level-triggered
epoll interface. However I would consider it on par with the event-triggered
one.
Regards,
Willy