Thanks Vitaly,

On Tue, Mar 4, 2025, at 5:48 AM, Lifshits, Vitaly wrote:
> On 3/3/2025 5:34 AM, Mark Pearson wrote:
>> Hi Andrew,
>> 
>> On Sun, Mar 2, 2025, at 11:13 AM, Andrew Lunn wrote:
>>> On Sun, Mar 02, 2025 at 03:09:35PM +0200, Lifshits, Vitaly wrote:
>>>>
>>>>
>>>> Hi Mark,
>>>>
>>>>> Hi Andrew
>>>>>
>>>>> On Thu, Feb 27, 2025, at 11:07 AM, Andrew Lunn wrote:
>>>>>>>>> +                     e1e_rphy(hw, PHY_REG(772, 26), &phy_data);
>>>>>>>>
>>>>>>>> Please add some #define for these magic numbers, so we have some idea
>>>>>>>> what PHY register you are actually reading. That in itself might help
>>>>>>>> explain how the workaround actually works.
>>>>>>>>
>>>>>>>
>>>>>>> I don't know what this register does I'm afraid - that's Intel 
>>>>>>> knowledge and has not been shared.
>>>>>>
>>>>>> What PHY is it? Often it is just a COTS PHY, and the datasheet might
>>>>>> be available.
>>>>>>
>>>>>> Given your setup description, pause seems like the obvious thing to
>>>>>> check. When trying to debug this, did you look at pause settings?
>>>>>> Knowing what this register is might also point towards pause, or
>>>>>> something totally different.
>>>>>>
>>>>>>  Andrew
>>>>>
>>>>> For the PHY - do you know a way of determining this easily? I can reach 
>>>>> out to the platform team but that will take some time. I'm not seeing 
>>>>> anything in the kernel logs, but if there's a recommended way of 
>>>>> confirming that would be appreciated.
>>>>
>>>> The PHY is I219 PHY.
>>>> The datasheet is indeed accessible to the public:
>>>> https://cdrdv2-public.intel.com/612523/ethernet-connection-i219-datasheet.pdf
>>>
>>> Thanks for the link.
>>>
>>> So it is reading page 772, register 26. Page 772 is all about LPI. So
>>> we can have a #define for that. Register 26 is Memories Power. So we
>>> can also have an #define for that.
>> 
>> Yep - I'll look to add this.
>> 
>>>
>>> However, that does not really help explain how this helps prevent an
>>> interrupt. I assume playing with EEE settings was also played
>>> with. Not that is register appears to have anything to do with EEE!
>>>
>> I don't think we did tried those - it was never suggested that I can recall 
>> (the original debug started 6 months+ ago). I don't know fully what testing 
>> Intel did in their lab once the issue was reproduced there.
>> 
>> If you have any particular recommendations we can try that - with a note 
>> that we have to run a soak for ~1 week to have confidence if a change made a 
>> difference (the issue can reproduce between 1 to 2 days).
>
> Personally I doubt that it is related to EEE since there was no real 
> link flap.
>
> I suggest to try replacing the register read for a short delay or 
> reading the PHY STATUS register instead.
>

Ack - we'll try that, and collect some other debug registers in the process.
Will update with findings - this may take a while :)

Thanks
Mark

Reply via email to