On 1/22/2015 4:27 AM, Konstantin Belousov wrote:
> On Thu, Jan 22, 2015 at 11:16:41AM +0100, Hans Petter Selasky wrote:
>> On 01/20/15 11:47, Slawa Olhovchenkov wrote:
>>> On Tue, Jan 20, 2015 at 08:29:47AM +0100, Hans Petter Selasky wrote:
>>>> On 01/17/15 23:18, Hans Petter Selasky wrote:
>>>>> On 01/17/15 20:11, Jason Wolfe wrote:
>>>>>> HPS,
>>>>>> Just to give a quick status update, this patch has most certainly
>>>>>> resolved our spin lock held too long panics on stable/10.
>>>>>> Thank you to JHB for spending some time digging into the issue and
>>>>>> leading us to td_slpcallout as the culprit, and HPS for your rewrite.
>>>>>> I had heard rumors of other being affected by similar issues, so this
>>>>>> seems like a fine candidate for an MFC if possible.
>>>>>> Jason
>>>>> Hi Jason,
>>>>> I'm glad to hear that my patch has resolved your issue and I'm happy we
>>>>> now have a more stable system.
>>>>> It was actually a co-worker at work which wrote some bad code which I
>>>>> started debugging which then lead me to look at the callout subsystem.
>>>>> One bug kills the other ;-)
>>>>> I'm planning a MFC to 10-stable - yes, and will possibly add the
>>>>> _callout_stop_safe() function to not break binary compatibility with
>>>>> existing drivers as part of the MFC.
>>>>> --HPS
>>>> Hi,
>>>> Here is a followup patch for the TCP stack like I mentioned in the
>>>> beginning of the work done on the callout subsystem:
>>>> https://reviews.freebsd.org/D1563
>>>> If someone has a setup for massive TCP testing please give it a spin.
>>> I have on 10.1 (with applied r261906).
>> FYI:
>> r277213 is going to be pulled out from -current in at maximum a few 
>> hours from now, because developers need more time to review patches in 
>> surrounding areas like the TCP stack area to restore distribution of 
>> callouts on multiple CPUs when using MPSAFE callouts to avoid congestion 
>> in the TCP stack.
> No, r277213 was requested to be reverted not due to TCP issues.
> The main complain is that you left indefinite amount of cases degraded,
> and there is no analysis of each such case, nor even a list of the cases
> that need to be fixed (or argumentation why consumer of the callout KPI
> could be left as is).
> Just providing fix for one place is not enough.

I have a similar concern about out-of-tree work. It would be surprising
for a vendor or module developer to find their performance degrade if
they missed accounting for this change. At a minimum, an UPDATING entry
should be added explaining the change and what must be done for consumers.

Bryan Drewery

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to