Any pointers on how to triage/investigate the problem further?

I'm willing to dig deeper into the driver, but I won't be able to do it without 
some tips on where to start looking and what to look for.

Thanks,
Anatoli

On 17/7/20 18:49, Anatoli wrote:
> Hi Stuart,
> 
> Thanks for your suggestion.
> 
> Unfortunately, it had no effect. I made this change:
> 
> index 5cc8bd6862b..47e639fdc6c 100644
> --- sys/dev/pci/if_re_pci.c
> +++ sys/dev/pci/if_re_pci.c
> @@ -156,9 +156,10 @@ re_pci_attach(struct device *parent, struct device 
> *self, void *aux)
>         }
>  
>         /* Allocate interrupt */
> -       if (pci_intr_map_msi(pa, &ih) == 0)
> -               sc->rl_flags |= RL_FLAG_MSI;
> -       else if (pci_intr_map(pa, &ih) != 0) {
> +//     if (pci_intr_map_msi(pa, &ih) == 0)
> +//             sc->rl_flags |= RL_FLAG_MSI;
> +//     else 
> +       if (pci_intr_map(pa, &ih) != 0) {
>                 printf(": couldn't map interrupt\n");
>                 return;
>         }
> 
> recompiled the kernel and booted it.
> 
> Upon cable disconnect and reconnect, the re0 nic entered "hibernate"
> state and came back only with outgoing pings as before. + "re0: watchdog
> timeout" in dmesg.
> 
> I suppose that on inactivity (or when detecting "no carrier" status) it
> somehow disables interrupts or they timeout and become disabled. That's
> why it doesn't see any incoming packets. And just upon outgoing packets
> it somehow reactivates the interrupts. Does it make any sense?
> 
> What else could I try?
> 
> Thanks,
> Anatoli
> 
> 
> On 17/7/20 07:12, Stuart Henderson wrote:
>> As hinted in the Reddit post, try disabling MSI. Unlikely to be the 
>> permanent fix but it will give more information.
>>
>> In if_re_pci.c:
>>
>> /* Allocate interrupt */
>> if (pci_intr_map_msi(pa, &ih) == 0)
>>                sc->rl_flags |= RL_FLAG_MSI; else if (pci_intr_map(pa, &ih) 
>> != 0) { printf(": couldn't map interrupt\n"); return; }
>>
>> Remove "if (pci_intr_map_msi ... else" and keep "if (pci_intr_map(..."
>>
>> There are plenty of systems with re(4) that don't have this problem, it 
>> definitely doesn't affect every machine and/or every re(4).
>>
> 

Reply via email to