On Thu, Jul 21, 2016 at 04:43:25PM +0300, Ilya Maximets wrote: > > > On 21.07.2016 16:35, Yuanhan Liu wrote: > > On Thu, Jul 21, 2016 at 04:19:35PM +0300, Ilya Maximets wrote: > >> If something abnormal happened to QEMU, 'connect()' can block calling > >> thread (e.g. main thread of OVS) forever or for a really long time. > >> This can break whole application or block the reconnection thread. > >> > >> Example with OVS: > >> > >> ovs_rcu(urcu2)|WARN|blocked 512000 ms waiting for main to quiesce > >> (gdb) bt > >> #0 connect () from /lib64/libpthread.so.0 > >> #1 vhost_user_create_client (vsocket=0xa816e0) > >> #2 rte_vhost_driver_register > >> #3 netdev_dpdk_vhost_user_construct > >> #4 netdev_open (name=0xa664b0 "vhost1") > >> [...] > >> #11 main > >> > >> Fix that by setting non-blocking mode for client sockets for connection. > >> > >> Fixes: 64ab701c3d1e ("vhost: add vhost-user client mode") > >> > >> Signed-off-by: Ilya Maximets <i.maximets at samsung.com> > > > > Acked-by: Yuanhan Liu <yuanhan.liu at linux.intel.com> > > > > One help I'd like to ask is that I'd appriciate if you could do the test > > to make sure that your 2 (latest) patches fix the two issues you reported. > > > > You might have already done that; I just want to make sure. > > I've performed the test with 'ofport_request' script before sending patches. > And currently test still works. No leaks of descriptors, no hangs, > no QEMU crashes observed. > Sometimes network device breaks on QEMU side, but it's QEMU issue. In this > case I'm receiving following message from DPDK's vhost: > > VHOST_CONFIG: vhost-user client: socket created, fd: 28 > VHOST_CONFIG: failed to connect to /vhost1: Resource temporarily unavailable > VHOST_CONFIG: /vhost1: reconnecting... > > Before the 'hang' patch there was hang of main thread. > > After QEMU restart all works normally. OVS restart not required.
Good to know and appreciate that! --yliu