Tue, Nov 01, 2016 at 04:36:50PM CET, da...@davemloft.net wrote: >From: Roopa Prabhu <ro...@cumulusnetworks.com> >Date: Tue, 01 Nov 2016 08:14:14 -0700 > >> On 11/1/16, 7:19 AM, Eric Dumazet wrote: >>> On Tue, 2016-11-01 at 00:57 +0200, Ido Schimmel wrote: >>>> On Mon, Oct 31, 2016 at 02:24:06PM -0700, Eric Dumazet wrote: >>>>> How well will this work for large FIB tables ? >>>>> >>>>> Holding rtnl while sending thousands of skb will prevent consumers to >>>>> make progress ? >>>> Can you please clarify what do you mean by "while sending thousands of >>>> skb"? This patch doesn't generate notifications to user space, but >>>> instead invokes notification routines inside the kernel. I probably >>>> misunderstood you. >>>> >>>> Are you suggesting this be done using RCU instead? Well, there are a >>>> couple of reasons why I took RTNL here: >>>> >>> No, I do not believe RCU is wanted here, in control path where we might >>> sleep anyway. >>> >>>> 1) The FIB notification chain is blocking, so listeners are expected to >>>> be able to sleep. This isn't possible if we use RCU. Note that this >>>> chain is mainly useful for drivers that reflect the FIB table into a >>>> capable device and hardware operations usually involve sleeping. >>>> >>>> 2) The insertion of a single route is done with RTNL held. I didn't want >>>> to differentiate between both cases. This property is really useful for >>>> listeners, as they don't need to worry about locking in writer-side. >>>> Access to data structs is serialized by RTNL. >>> My concern was that for large iterations, you might hold RTNL and/or >>> current cpu for hundred of ms or even seconds... >>> >> I have the same concern as Eric here. >> >> I understand why you need it, but can the driver request for an initial dump >> and that >> dump be made more efficient somehow ie not hold rtnl for the whole dump ?. >> instead of making the fib notifier registration code doing it. >> >> these routing table sizes can be huge and an analogy for this in user-space: >> We do request a netlink dump of routing tables at initialization (on driver >> starts or resets)... >> but, existing netlink routing table dumps for that scale don't hold rtnl for >> the whole dump. >> The dump is split into multiple responses to the user and hence it does not >> starve other rtnl users. >> >> In-fact I don't think netlink routing table dumps from user-space hold >> rtnl_lock for the whole dump. >> IIRC this was done to allow route add/dels to be allowed in parallel for >> performance reasons. >> (I will need to double check to confirm this). > >I've always had some reservations about using notifiers for getting >the FIB entries down to the offloaded device.
Yeah, me too. But there is no really other way to do it. I thought about it quite long. But maybe I missed something. > >And this problem is just another symptom that it is the wrong >mechanism for propagating this information. > >As suggested by Roopa here, perhaps we're looking at the problem from >the wrong direction. We tend to design NDO ops and notifiers, to >"push" things to the driver, but maybe something more like a push+pull >model is better. How do you imagine this mode should looks like? Could you draw me some example? Thanks!