Thanks Vitaly, On Tue, Mar 4, 2025, at 5:48 AM, Lifshits, Vitaly wrote: > On 3/3/2025 5:34 AM, Mark Pearson wrote: >> Hi Andrew, >> >> On Sun, Mar 2, 2025, at 11:13 AM, Andrew Lunn wrote: >>> On Sun, Mar 02, 2025 at 03:09:35PM +0200, Lifshits, Vitaly wrote: >>>> >>>> >>>> Hi Mark, >>>> >>>>> Hi Andrew >>>>> >>>>> On Thu, Feb 27, 2025, at 11:07 AM, Andrew Lunn wrote: >>>>>>>>> + e1e_rphy(hw, PHY_REG(772, 26), &phy_data); >>>>>>>> >>>>>>>> Please add some #define for these magic numbers, so we have some idea >>>>>>>> what PHY register you are actually reading. That in itself might help >>>>>>>> explain how the workaround actually works. >>>>>>>> >>>>>>> >>>>>>> I don't know what this register does I'm afraid - that's Intel >>>>>>> knowledge and has not been shared. >>>>>> >>>>>> What PHY is it? Often it is just a COTS PHY, and the datasheet might >>>>>> be available. >>>>>> >>>>>> Given your setup description, pause seems like the obvious thing to >>>>>> check. When trying to debug this, did you look at pause settings? >>>>>> Knowing what this register is might also point towards pause, or >>>>>> something totally different. >>>>>> >>>>>> Andrew >>>>> >>>>> For the PHY - do you know a way of determining this easily? I can reach >>>>> out to the platform team but that will take some time. I'm not seeing >>>>> anything in the kernel logs, but if there's a recommended way of >>>>> confirming that would be appreciated. >>>> >>>> The PHY is I219 PHY. >>>> The datasheet is indeed accessible to the public: >>>> https://cdrdv2-public.intel.com/612523/ethernet-connection-i219-datasheet.pdf >>> >>> Thanks for the link. >>> >>> So it is reading page 772, register 26. Page 772 is all about LPI. So >>> we can have a #define for that. Register 26 is Memories Power. So we >>> can also have an #define for that. >> >> Yep - I'll look to add this. >> >>> >>> However, that does not really help explain how this helps prevent an >>> interrupt. I assume playing with EEE settings was also played >>> with. Not that is register appears to have anything to do with EEE! >>> >> I don't think we did tried those - it was never suggested that I can recall >> (the original debug started 6 months+ ago). I don't know fully what testing >> Intel did in their lab once the issue was reproduced there. >> >> If you have any particular recommendations we can try that - with a note >> that we have to run a soak for ~1 week to have confidence if a change made a >> difference (the issue can reproduce between 1 to 2 days). > > Personally I doubt that it is related to EEE since there was no real > link flap. > > I suggest to try replacing the register read for a short delay or > reading the PHY STATUS register instead. >
Ack - we'll try that, and collect some other debug registers in the process. Will update with findings - this may take a while :) Thanks Mark
