On 01/04/2016 09:52, "Jarno Rajahalme" <ja...@ovn.org> wrote:
> >> On Mar 30, 2016, at 8:08 PM, Daniele Di Proietto >><diproiet...@vmware.com> wrote: >> >> >> On 30/03/2016 16:01, "Ben Pfaff" <b...@ovn.org> wrote: >> >>> (I'm taking a look at this patch specifically because Daniele asked me; >>> I'm not planning to review the whole series.) >>> >>> On Mon, Mar 28, 2016 at 12:41:40PM -0700, Daniele Di Proietto wrote: >>>> The dpif-netdev datapath keeps ports in a cmap which is written only >>>>by >>>> the main thread (holding port_mutex), but which is read concurrently >>>>by >>>> many threads (most notably the pmd threads). >>>> >>>> When removing ports from the datapath we should postpone the deletion, >>>> otherwise another thread might access invalid memory while reading the >>>> cmap. >>>> >>>> This commit splits do_port_del() in do_port_remove() and >>>> do_port_destroy(): the former removes the port from the cmap, while >>>>the >>>> latter reclaims the memory and drops the reference to the underlying >>>> netdev. >>> >>> s/del_port/port_del/ here: >> >> Thanks, changed >> >>> >>>> dpif_netdev_del_port() now uses ovsrcu_synchronize() before calling >>>> do_port_destroy(), to avoid memory corruption in concurrent readers. >>> >>> ovsrcu_synchronize() requires that nothing in the thread that calls it >>> is relying on RCU to keep objects around. That means that no caller of >>> dfpi_port_del()--there are a few of them--can rely on it. This is >>> usually a risky assumption, especially because this assumption can >>> change later. Is there reason to believe that it isn't important in >>>all >>> of these cases? >> >> I agree that's risky, but I think it's the only way to keep the ports >>RCU >> protected, because a port needs to be effectively deleted before >> dpif_netdev_port_del() can return. >> > >If this is because otherwise a following port_add can fail, as the old >port is still around, maybe we could make the highest possible level of >port_add detect the failure and then rcu_synchronize and try again? Would >that work? > > Jarno That would work for deleting the port, but there are other reasons we need to synchronize. When a netdev has to be reconfigured (in the last patch of the series) and we remove it from the cmap, we need to synchronize to make sure that other threads have stopped using it. I'm trying to add some compile-time RCU checks using clang thread safety annotations, but for those to be effective we have to introduce ovsrcu_read_lock() and ovsrcu_read_unlock() on every block that keeps RCU references and I'm not sure we want to go down that path. I've also remembered that dpif_netdev_port_add() and dpif_netdev_port_del() might already quiesce, because they could call ovs_mutex_cond_wait(). I'll try to post a patch to fix that, if we believe it's an issue. If ovsrcu_synchronize() is not an acceptable solution, I guess we should just use an hmap for ports and have pmdthread-local copies. This means that every port_add or port_del (even for non DPDK ports) would need to stop every pmd thread, but I guess there's no way around it. Thanks, Daniele _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev