Hi Slawa,

On 10/12/16 5:42 PM, Slawa Olhovchenkov wrote:
> On Wed, Oct 12, 2016 at 05:17:35PM +0200, Julien Charbon wrote:
>>>>>>>>>>  I see, thus just for the context:  The TCP stack in sys/dev/cxgb* 
>>>>>>>>>> is a
>>>>>>>>>> TOE (TCP Offload Engine?) TCP stack for Chelsio NICs, it is a
>>>>>>>>>> separate/side TCP stack that is used only with TCP_OFFLOAD option.
>>>>>>>>>>  This TOE TCP stack actually has its own set of detach()/input()
>>>>>>>>>> functions and seems to check INP_DROPPED flag properly.  I guess @np
>>>>>>>>>> check fixes in socket TCP stack and decides which one can also impact
>>>>>>>>>> the Chelsio TOE TCP stack.  Some bugs are only in socket TCP stack, 
>>>>>>>>>> some
>>>>>>>>>> are only in TOE TCP stack.
>>>>>>>>> I am fear about other direction -- setting INP_TIMEWAIT in Chelsio TOE
>>>>>>>>> TCP stack and impact this to
>>>>>>>>> tcp_timer_2msl()/tcp_close()/sofree()/tcp_usr_detach() path.
>>>>>>>>  I see, I expect no problem on this side as tcp_timer_2msl() checks the
>>>>>>>> INP_TIMEWAIT flag and do not call tcp_close() if set.
>>>>>>> I am about case when at time of first INP_WUNLOCK() tcp_timer_2msl()
>>>>>>> don't see INP_TIMEWAIT, call tcp_close(), tcp_close() do INP_WUNLOCK()
>>>>>>> and now Chelsio TOE take INP_WLOCK, do tcp_twstart() and set
>>>>>>> INP_TIMEWAIT. After this tcp_timer_2msl resume and have unexpected
>>>>>>> INP_TIMEWAIT in tcp_usr_detach().
>>>>>>  Sure, basically the same bug that in classic TCP stack.  If you think
>>>>>> it can happen, send an email describing that to np@ and he will check
>>>>>> and fix that.  He is a TOE TCP stack expert and I am not.  In all cases,
>>>>>> if this issue is possible in TOE TCP stack context, the patch will be
>>>>>> straightforward:  If the INP_DROPPED flag is set do not call 
>>>>>> tcp_twstart().
> I am email to np@
>>>>>>  The current patch focuses only on the classic TCP stack.
>>>>> May be current workaround (with logging) in tcp_usr_detach() is good
>>>>> solutuion for preventing system lockout by similar bugs?
>>>>  Good question, the quick workaround in tcp_usr_detach() does not handle
>>>> all the cases.  If it reduces the number of crashes you can still find
>>>> scenarios where it can have unexpected side effect.
>>> This is best then guaranted lockout.
>>>>  Long term solution is to enforce:  If the inp has the INP_DROPPED flag
>>>> just stop processing it and return.  If you grep the INP_DROPPED flag in
>>>> kernel sources, you can see that this test is already done in almost all
>>>> tcp_*() processing functions but tcp_input().
>>>>  I would say that even without this issue tcp_input() should check
>>>> INP_DROPPED flags after INP_WLOCK anyway.  Same for the TOE TCP stack,
>>>> you are simply not supposed to process a inp with INP_DROPPED flag.
>>> Absolutly acceptant!
>>> May point is: more check and good handling of check result is best for
>>> stability.
>>> I.e. AND check INP_DROPPED in tcp_input AND workaroud INP_TIMEWAIT in
>>> tcp_usr_detach (with logging) and check of some posible cases in XXX TOE.
>>> Current TCP stack too complex and have many corner cases. This is need
>>> additional guards where posible (not caused kernel panic).
>>  I see your point:  Even if this issue is caught by this assert:
>> KASSERT(tp == NULL, ("tcp_detach: INP_TIMEWAIT && "
>>     "INP_DROPPED && tp != NULL"));
>> https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/tcp_usrreq.c#L213
>>  you might not have INVARIANT option, then you will get a lockout quite
>> difficult to debug.  Thus what we can do is:
>>  - If INVARIANT is set:  kernel panic to get all the details in the core.
>>  - If INVARIANT is not set:  Log this error with an explicit kernel
>> log(LOG_ERR) describing the issue, and then use the workaround to avoid
>> the double-free to let the system to good enough state.
>>  Something like:
> Yes, thanks!

 Proposed changes added in the review:


 tell me when you have three days without issue with this change.

>> tcp_detach() {
>>   ...
>>   if (inp->inp_flags & INP_TIMEWAIT) {
>>     ...
>>     if (inp->inp_flags & INP_DROPPED) {
>>       in_pcbdetach(inp);
>>       if (__predict_true(tp == NULL)) {
>>           in_pcbfree(inp);
>>       } else {
>> #ifdef INVARIANTS
>>         panic("tcp_detach: tp != NULL, That's not good because 'blah'\n");
>> #else
>>         log(LOG_ERR, "tcp_detach: tp != NULL, That's not good because
>> 'blah'\n");
> May be some more info in log can help to detect root cause of issuse?
> I am don't know what info, may be flags or number of references?

 For this kind of issue, the useful part is the stacktrace.  INVARIANT
will give you that trace in the core, and without INVARIANT then it is
better to use dtrace:

$ cat tcp-twstart-dropped.d
/args[0]->t_inpcb->inp_flags & 0x04000000/
  printf("INP_DROPPED in tcp_twstart: %x", args[0]->t_inpcb->inp_flags);


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to