On Mon, Dec 17, 2018 at 7:39 AM Petr Machata <[email protected]> wrote: > > Roopa Prabhu <[email protected]> writes: > > > On Mon, Dec 17, 2018 at 6:58 AM Petr Machata <[email protected]> wrote: > >> > >> When a failure occurs in rtnl_configure_link(), the current code > >> calls unregister_netdevice() to roll back the earlier call to > >> register_netdevice(), and jumps to errout, which calls > >> vxlan_fdb_destroy(). > >> > >> However unregister_netdevice() calls transitively ndo_uninit(), > >> which is vxlan_uninit(), and that already takes care of deleting > >> the default FDB entry by calling vxlan_fdb_delete_default(). > >> Since the entry added earlier in __vxlan_dev_create() is exactly > >> the default entry, the cleanup code in the errout block always > >> leads to double free and thus a panic. > >> > >> Fix by skipping the rollback code and just returning. > > > > I trust your test case that pointed to a potential issue..., but can > > we add a comment there on > > why there is an fdb add but no destroy in the roll back code ?. Also, > > This is at a place in the code where the clean-up would normally be: > > >> + /* the all_zeros_mac entry was deleted at vxlan_uninit */ > > Is that what you meant? > > > the destroy code from uninit path does > > look for an entry being present before attempting a destroy, its > > unclear to me yet why that ends up doing a double free ? > > That code is called first, finds the FDB entry, destroys it. After it > returns, another destroy call is invoked from __vxlan_dev_create(). >
ok, I am wondering if we can just avoid calling the fdb destroy in error path if unregister has already happened on the device with a flag local to that function or add extra check for the fdb being present before calling destroy in the error path. This keeps this fdb destroy checks local to this function.
