> On 30 May 2020, at 09:40, David Gwynne <da...@gwynne.id.au> wrote:
> 
> On Mon, May 25, 2020 at 09:44:22AM +0200, Martin Pieuchot wrote:
>> On 23/05/20(Sat) 15:38, Vitaliy Makkoveev wrote:
>>>> On 23 May 2020, at 12:54, Martin Pieuchot <m...@openbsd.org> wrote:
>>>> On 22/05/20(Fri) 13:25, Vitaliy Makkoveev wrote:
>>>>> On Fri, May 22, 2020 at 07:57:13AM +1000, David Gwynne wrote:
>>>>>> [...] 
>>>>>> can you try the following diff?
>>>>>> 
>>>>> 
>>>>> I tested this diff and it works for me. But the problem I pointed is
>>>>> about pipex(4) locking.
>>>>> 
>>>>> pipex(4) requires NET_LOCK() be grabbed not only for underlaying
>>>>> ip{,6}_output() but for itself too. But since pppac_start() has
>>>>> unpredictable behavior I suggested to make it predictable [1].
>>>> 
>>>> What needs the NET_LOCK() in their?  We're talking about
>>>> pipex_ppp_output(), right?  Does it really need the NET_LOCK() or
>>>> the KERNEL_LOCK() is what protects those data structures?
>>> 
>>> Yes, about pipex_ppp_output() and pipex_output(). Except
>>> ip{,6}_output() nothing requires NET_LOCK(). As David Gwynne pointed,
>>> they can be replaced by ip{,6}_send().
>> 
>> Locks protect data structures, you're talking about functions, which
>> data structures are serialized by this lock?  I'm questioning whether
>> there is one.
>> 
>>> [...]
>>>> In case of pipex(4) is isn't clear that the NET_LOCK() is necessary.
>>> 
>>> I guess, pipex(4) was wrapped by NET_LOCK() to protect it while it???s
>>> accessed through `pr_input???. Is NET_LOCK() required for this case?
>> 
>> pipex(4) like all the network stack has been wrapped in the NET_LOCK()
>> because it was easy to do.  That means it isn't a concious decision or
>> design.  The fact that pipex(4) code runs under the NET_LOCK() is a side
>> effect of how the rest of the stack evolved.  I'm questioning whether
>> this lock is required there.  In theory it shouldn't.  What is the
>> reality?
> 
> pipex and pppx pre-date the NET_LOCK, which means you can assume
> that any implicit locking was and is done by the KERNEL_LOCK. mpi is
> asking the right questions here.
> 
> As for the ifq maxlen difference between pppx and pppac, that's more
> about when and how quickly they were written more than anything else.
> The IFQ_SET_MAXLEN(&ifp->if_snd, 1) in pppx is because that's a way to
> bypass transmit mitigation for pseudo/virtual interfaces. That was the
> only way to do it historically. It is not an elegant hack to keep
> hold of the NET_LOCK over a call to a start routine.
> 
> As a rule of thumb, network interface drivers should not (maybe
> cannot) rely on the NET_LOCK in their if_start handlers. To be
> clear, they should not rely on it being held by the network stack
> when if_start is called because sometimes the stack calls it without
> holding NET_LOCK, and they should not take it because they might
> be called by the stack when it is being held.
> 
> Also, be aware that the ifq machinery makes sure that the start
> routine is not called concurrently or recursively. You can queue
> packets for transmission on an ifq from anywhere in the kernel at
> any time, but only one cpu will run the start routine. Other cpus
> can queue packets while another one is running if_start, but the
> first one ends up responsible for trying to transmit it.
> 
> ifqs also take the KERNEL_LOCK before calling if_start if the interface
> is not marked as IFXF_MPSAFE.
> 
> The summary is that pppx and pppac are not marked as mpsafe so their
> start routines are called with KERNEL_LOCK held. Currently pppx
> accidentally gets NET_LOCK because of the IFQ_SET_MAXLEN, but shouldn't
> rely on it.
> 
> Cheers,
> dlg
> 

Thanks for explanation.
Will you commit diff you posted in this thread?

Reply via email to