On Fri, Feb 15, 2019 at 2:47 PM Willem de Bruijn <willemdebruijn.ker...@gmail.com> wrote: > > > > > 2. Why there is no socket option for sysctl.net.busy_poll? Clearly > > > > sysctl_net_busy_poll is global and SO_BUSY_POLL only works for > > > > sysctl.net.busy_read. > > > > > > I guess because of how sock_poll works. In that case it is not needed. > > > The poll duration applies more to the pollset than any of the > > > individual sockets, too. > > > > > > Good point, it's probably like struct eventpoll vs. struct epitem. > > > > The reason why I am looking for a per-socket tuning is to minimize > > the impact of setting busy_poll. I don't know if it is possible to somehow > > make this per-socket via epoll interfaces, perhaps fundamentally > > it is impossible? > > I think it may be possible. The way busy_read and busy_poll work > in sock_poll is that the sum of all (per socket tunable) busy_read > durations on the sockets in the pollset is ~bound by (global) busy_poll.
Good idea!! I was actually thinking about checking sk->sk_ll_usec ep_set_busy_poll_napi_id(). Your idea sounds better than mine. Maybe we can do both together. :) > > The epoll implementation is restricted in the sense that it polls only > on one napi_id at a time. Alongside setting ep->napi_id in > ep_set_busy_poll_napi_id, we could also set a new ep field takes > the min of the global busy_poll and sk->sk_ll_usec. > > Though I guess you want to be able to poll on a given pollset > without setting the global sysctl_net_busy_poll at all? That > would be a useful feature both for epoll and poll/select. But > definitely requires refining net_busy_loop_on() to optionally > take some state derived from the sockets in the (e)pollset. Yes, or at least give some control to each application. I was thinking about this: diff --git a/fs/eventpoll.c b/fs/eventpoll.c index a5d219d920e7..1b104c8bda5e 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -432,7 +432,7 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi) return; sk = sock->sk; - if (!sk) + if (!sk || !sk->sk_ll_usec) return; napi_id = READ_ONCE(sk->sk_napi_id); With this change, busy_poll would rely on either busy_read or SO_BUSY_POLL. Does this make any sense? Thanks.