Hi,

Since there is only 1 dispatcher thread, it will be the bottleneck if there are 
many miss_handler threads in a 32-core machine.

Chengyuan's test shows that most of the high CPU of miss_handler thread is 
caused by ticket spin locks triggered by futex calls. This can be related to 
the fact that miss_handlers are frequently waked up when there are only 1 or 2 
upcalls for them, and consume and then wait again, instead of handling upcalls 
in a batch mode (i.e.  handle FLOW_MISS_MAX_BATCH number of upcalls after one 
wait). We observed this by adding logs to check handler->n_upcalls after the of 
ovs_mutex_cond_wait() in miss_handler. And the reason could be that the single 
dispatcher thread cannot supply upcalls fast enough for so many miss_handlers 
to work in batch mode. 

I suspect it is this frequent wait-and-wakeup in a 32-core environment results 
in high CPU because of futex implementation. Then I had a test by forcing the 
ovs_mutex_cond_wait() in a loop until there are FLOW_MISS_MAX_BATCH (=50) 
n_upcalls, and observed that the total CPU dropped from ~330% to ~190%! 
Particularly, the thread ovs-vswitchd's CPU dropped from ~90% to ~10%, each 
miss_handler's CPU dropped from ~%6 to ~4%, and dispatcher thread's CPU kept 
unchanged ~60%. This result may prove my speculation to some extent: by forcing 
the wait loop accumulating 50 upcalls, the miss_handler then takes certain 
amount of time to consume the batch of upcalls, which increases the probability 
of dispatcher schedules more upcalls to it, thus next time when the 
miss_handler checks the handler->n_upcalls it is non-zero and so doesn't need 
to wait. From my test logs, the ratio between wait:non-wait for this 
handler->n_upcalls check decreased after the change. This could leads to less 
futex calls, and in perf profiling result it shows significant increase in flow 
handling functions such as flow_hash_in_minimask(), and decrease in kernel 
spin-locks and mutex operations.

However, there are still things unclear to me:
1. I see in the code that a miss_handler should be woken up only when it has 50 
upcalls pending by dispatcher, but why my test shows they always wakeup from 
ovs_mutex_cond_wait() with only 1 or 2 upcalls for them (rarely wakeup when 
there is no upcalls)? Is there any other wakeup mechanism I missed?
2. why ovs-vswitchd occupies so much CPU in short live flow test before my 
change? And why it drops so dramatically? What's the contention between 
ovs-vswitchd and miss_handler?

A better solution for this bottleneck of dispatcher, in my opinion, could be 
that each handler thread receives upcalls assigned to them from kernel directly 
thus no conditional wait and signal involved, which avoid unnecessary context 
switch and futex scalling problem in multicore env. The selection of handler 
can be done by kernel with same kind of hash, but put into different queues 
per-handler, and this way packet order is preserved. Can this be a valid 
proposal?

Best regards,
Han Zhou

-----Original Message-----
From: discuss-boun...@openvswitch.org [mailto:discuss-boun...@openvswitch.org] 
On Behalf Of Ben Pfaff
Sent: 2013年11月26日 4:55
To: Ethan Jackson
Cc: discuss@openvswitch.org ML
Subject: Re: [ovs-discuss] ovs-vswitchd 2.0 has high cpu usage

On Sat, Nov 23, 2013 at 03:24:17PM +0800, Chengyuan Li wrote:
> Do you have suggested max threads number?

Ethan, how many threads do you suggest using?  Chengyuan has a 32-core machine 
and sees high CPU usage with 28 threads, much lower with 4 threads.
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to