On 03/24, Jakub Kicinski wrote: > On Tue, 24 Mar 2026 15:49:27 -0700 Stanislav Fomichev wrote: > > > > Not sure why cancel+release, maybe you're thinking about the unregister > > > > path? This is rtnl_unlock -> netdev_run_todo -> __rtnl_unlock + some > > > > extras. > > > > > > > > And the flush is here to plumb the addresses to the real devices > > > > before we return to the callers. Mostly because of the following > > > > things we have in the tests: > > > > > > > > # TEST: team cleanup mode lacp > > > > [FAIL] > > > > # macvlan unicast address not found on a slave > > > > > > > > Can you explain a bit more on the suggestion? > > > > > > Oh, I thought it's here for unregister! Feels like it'd be cleaner to > > > add the flush in dev_*c_add() and friends? How hard would it be to > > > identify the callers in atomic context? > > > > Not sure we can do it in dev_xc_add because it runs under rtnl :-( > > I currently do flush in netdev_run_todo because that's the place that > > doesn't hold rtnl. Otherwise flush will get stuck because the work > > handler grabs it... > > I was thinking of something a'la linkwatch. We can "steal" / "flush" > the pending work inline. I guess linkwatch is a major source of races > over the years... > > Does the macvlan + team problem still happens with the current > implementation minus the flush? We are only flushing once so only > pushing the addresses thru one layer of async callbacks.
Yes, it does happen consistently when I remove the flush. It also happens with my internal v4, so I need to look again at what's going on. Not sure whether it's my internal regression or I was just sloppy/lucky (since you're correct in pointing out that we flush only once). Before I went down the workqueue route, I had a simple net_todo_list-like approach: `list_add_tail` on enqueue and `while(!list_empty) run_work()` on rtnl_unlock. This had a nice properly of tracking re-submissions (by checking whether the device's list_head is linked into the list or not) and it was relatively easy to do the recursive flush. Let me try get back to this approach and see whether it solves the flush? Not sure what wq buys us at this point.
