> On 30 May 2020, at 09:40, David Gwynne <da...@gwynne.id.au> wrote: > > On Mon, May 25, 2020 at 09:44:22AM +0200, Martin Pieuchot wrote: >> On 23/05/20(Sat) 15:38, Vitaliy Makkoveev wrote: >>>> On 23 May 2020, at 12:54, Martin Pieuchot <m...@openbsd.org> wrote: >>>> On 22/05/20(Fri) 13:25, Vitaliy Makkoveev wrote: >>>>> On Fri, May 22, 2020 at 07:57:13AM +1000, David Gwynne wrote: >>>>>> [...] >>>>>> can you try the following diff? >>>>>> >>>>> >>>>> I tested this diff and it works for me. But the problem I pointed is >>>>> about pipex(4) locking. >>>>> >>>>> pipex(4) requires NET_LOCK() be grabbed not only for underlaying >>>>> ip{,6}_output() but for itself too. But since pppac_start() has >>>>> unpredictable behavior I suggested to make it predictable [1]. >>>> >>>> What needs the NET_LOCK() in their? We're talking about >>>> pipex_ppp_output(), right? Does it really need the NET_LOCK() or >>>> the KERNEL_LOCK() is what protects those data structures? >>> >>> Yes, about pipex_ppp_output() and pipex_output(). Except >>> ip{,6}_output() nothing requires NET_LOCK(). As David Gwynne pointed, >>> they can be replaced by ip{,6}_send(). >> >> Locks protect data structures, you're talking about functions, which >> data structures are serialized by this lock? I'm questioning whether >> there is one. >> >>> [...] >>>> In case of pipex(4) is isn't clear that the NET_LOCK() is necessary. >>> >>> I guess, pipex(4) was wrapped by NET_LOCK() to protect it while it???s >>> accessed through `pr_input???. Is NET_LOCK() required for this case? >> >> pipex(4) like all the network stack has been wrapped in the NET_LOCK() >> because it was easy to do. That means it isn't a concious decision or >> design. The fact that pipex(4) code runs under the NET_LOCK() is a side >> effect of how the rest of the stack evolved. I'm questioning whether >> this lock is required there. In theory it shouldn't. What is the >> reality? > > pipex and pppx pre-date the NET_LOCK, which means you can assume > that any implicit locking was and is done by the KERNEL_LOCK. mpi is > asking the right questions here. > > As for the ifq maxlen difference between pppx and pppac, that's more > about when and how quickly they were written more than anything else. > The IFQ_SET_MAXLEN(&ifp->if_snd, 1) in pppx is because that's a way to > bypass transmit mitigation for pseudo/virtual interfaces. That was the > only way to do it historically. It is not an elegant hack to keep > hold of the NET_LOCK over a call to a start routine. > > As a rule of thumb, network interface drivers should not (maybe > cannot) rely on the NET_LOCK in their if_start handlers. To be > clear, they should not rely on it being held by the network stack > when if_start is called because sometimes the stack calls it without > holding NET_LOCK, and they should not take it because they might > be called by the stack when it is being held. > > Also, be aware that the ifq machinery makes sure that the start > routine is not called concurrently or recursively. You can queue > packets for transmission on an ifq from anywhere in the kernel at > any time, but only one cpu will run the start routine. Other cpus > can queue packets while another one is running if_start, but the > first one ends up responsible for trying to transmit it. > > ifqs also take the KERNEL_LOCK before calling if_start if the interface > is not marked as IFXF_MPSAFE. > > The summary is that pppx and pppac are not marked as mpsafe so their > start routines are called with KERNEL_LOCK held. Currently pppx > accidentally gets NET_LOCK because of the IFQ_SET_MAXLEN, but shouldn't > rely on it. > > Cheers, > dlg >
Thanks for explanation. Will you commit diff you posted in this thread?