On 5/21/2024 9:11 PM, Paul Menzel wrote:
> Dear Jacob,
>
>
> Am 21.05.24 um 21:27 schrieb Jacob Keller:
>
>> On 5/20/2024 10:55 PM, Paul Menzel wrote:
>
>>> Am 20.05.24 um 23:39 schrieb Jacob Keller:
>>>> The ice driver reads data from the Shadow RAM portion of the NVM during
>>>> initialization, including data used to identify the NVM image and device,
>>>> such as the ETRACK ID used to populate devlink dev info fw.bundle.
>>>>
>>>> Currently it is using a fixed offset defined by ICE_CSS_HEADER_LENGTH to
>>>> compute the appropriate offset. This worked fine for E810 and E822 devices
>>>> which both have CSS header length of 330 words.
>>>>
>>>> Other devices, including both E825-C and E830 devices have different sizes
>>>> for their CSS header. The use of a hard coded value results in the driver
>>>> reading from the wrong block in the NVM when attempting to access the
>>>> Shadow RAM copy. This results in the driver reporting the fw.bundle as 0x0
>>>> in both the devlink dev info and ethtool -i output.
>>>>
>>>> The first E830 support was introduced by commit ba20ecb1d1bb ("ice: Hook up
>>>> 4 E830 devices by adding their IDs") and the first E825-C support was
>>>> introducted by commit f64e18944233 ("ice: introduce new E825C devices
>>>
>>> introduced
>>>
>>>> family")
>>>>
>>>> The NVM actually contains the CSS header length embedded in it. Remove the
>>>> hard coded value and replace it with logic to read the length from the NVM
>>>> directly. This is more resilient against all existing and future hardware,
>>>> vs looking up the expected values from a table. It ensures the driver will
>>>> read from the appropriate place when determining the ETRACK ID value used
>>>> for populating the fw.bundle_id and for reporting in ethtool -i.
>>>>
>>>> The CSS header length for both the active and inactive flash bank is stored
>>>> in the ice_bank_info structure to avoid unnecessary duplicate work when
>>>> accessing multiple words of the Shadow RAM. Both banks are read in the
>>>> unlikely event that the header length is different for the NVM in the
>>>> inactive bank, rather than being different only by the overall device
>>>> family.
>>>>
>>>> Fixes: ba20ecb1d1bb ("ice: Hook up 4 E830 devices by adding their IDs")
>>>> Co-developed-by: Paul Greenwalt <[email protected]>
>>>> Signed-off-by: Paul Greenwalt <[email protected]>
>>>> Signed-off-by: Jacob Keller <[email protected]>
>>>> Reviewed-by: Przemek Kitszel <[email protected]>
>>>> ---
>>>> drivers/net/ethernet/intel/ice/ice_nvm.c | 88
>>>> ++++++++++++++++++++++++++++++-
>>>> drivers/net/ethernet/intel/ice/ice_type.h | 14 +++--
>>>> 2 files changed, 93 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c
>>>> b/drivers/net/ethernet/intel/ice/ice_nvm.c
>>>> index 84eab92dc03c..5968011e8c7e 100644
>>>> --- a/drivers/net/ethernet/intel/ice/ice_nvm.c
>>>> +++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
>>>> @@ -374,11 +374,25 @@ ice_read_nvm_module(struct ice_hw *hw, enum
>>>> ice_bank_select bank, u32 offset, u1
>>>> *
>>>> * Read the specified word from the copy of the Shadow RAM found in the
>>>> * specified NVM module.
>>>> + *
>>>> + * Note that the Shadow RAM copy is always located after the CSS header,
>>>> and
>>>> + * is aligned to 64-byte (32-word) offsets.
>>>> */
>>>> static int
>>>> ice_read_nvm_sr_copy(struct ice_hw *hw, enum ice_bank_select bank, u32
>>>> offset, u16 *data)
>>>> {
>>>> - return ice_read_nvm_module(hw, bank, ICE_NVM_SR_COPY_WORD_OFFSET +
>>>> offset, data);
>>>> + u32 sr_copy;
>>>> +
>>>> + switch (bank) {
>>>> + case ICE_ACTIVE_FLASH_BANK:
>>>> + sr_copy = roundup(hw->flash.banks.active_css_hdr_len, 32);
>>>> + break;
>>>> + case ICE_INACTIVE_FLASH_BANK:
>>>> + sr_copy = roundup(hw->flash.banks.inactive_css_hdr_len, 32);
>>>> + break;
>>>> + }
>>>> +
>>>> + return ice_read_nvm_module(hw, bank, sr_copy + offset, data);
>>>> }
>>>>
>>>> /**
>>>> @@ -1009,6 +1023,72 @@ static int ice_determine_active_flash_banks(struct
>>>> ice_hw *hw)
>>>> return 0;
>>>> }
>>>>
>>>> +/**
>>>> + * ice_get_nvm_css_hdr_len - Read the CSS header length from the NVM CSS
>>>> header
>>>> + * @hw: pointer to the HW struct
>>>> + * @bank: whether to read from the active or inactive flash bank
>>>> + * @hdr_len: storage for header length in words
>>>> + *
>>>> + * Read the CSS header length from the NVM CSS header and add the
>>>> Authentication
>>>> + * header size, and then convert to words.
>>>> + *
>>>> + * Return: zero on success, or a negative error code on failure.
>>>> + */
>>>> +static int
>>>> +ice_get_nvm_css_hdr_len(struct ice_hw *hw, enum ice_bank_select bank,
>>>> + u32 *hdr_len)
>>>> +{
>>>> + u16 hdr_len_l, hdr_len_h;
>>>> + u32 hdr_len_dword;
>>>> + int status;
>>>> +
>>>> + status = ice_read_nvm_module(hw, bank, ICE_NVM_CSS_HDR_LEN_L,
>>>> + &hdr_len_l);
>>>> + if (status)
>>>> + return status;
>>>> +
>>>> + status = ice_read_nvm_module(hw, bank, ICE_NVM_CSS_HDR_LEN_H,
>>>> + &hdr_len_h);
>>>> + if (status)
>>>> + return status;
>>>> +
>>>> + /* CSS header length is in DWORD, so convert to words and add
>>>> + * authentication header size
>>>> + */
>>>> + hdr_len_dword = hdr_len_h << 16 | hdr_len_l;
>>>> + *hdr_len = (hdr_len_dword * 2) + ICE_NVM_AUTH_HEADER_LEN;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +/**
>>>> + * ice_determine_css_hdr_len - Discover CSS header length for the device
>>>> + * @hw: pointer to the HW struct
>>>> + *
>>>> + * Determine the size of the CSS header at the start of the NVM module.
>>>> This
>>>> + * is useful for locating the Shadow RAM copy in the NVM, as the Shadow
>>>> RAM is
>>>> + * always located just after the CSS header.
>>>> + *
>>>> + * Return: zero on success, or a negative error code on failure.
>>>> + */
>>>> +static int ice_determine_css_hdr_len(struct ice_hw *hw)
>>>> +{
>>>> + struct ice_bank_info *banks = &hw->flash.banks;
>>>> + int status;
>>>> +
>>>> + status = ice_get_nvm_css_hdr_len(hw, ICE_ACTIVE_FLASH_BANK,
>>>> + &banks->active_css_hdr_len);
>>>> + if (status)
>>>> + return status;
>>>> +
>>>> + status = ice_get_nvm_css_hdr_len(hw, ICE_INACTIVE_FLASH_BANK,
>>>> + &banks->inactive_css_hdr_len);
>>>> + if (status)
>>>> + return status;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> /**
>>>> * ice_init_nvm - initializes NVM setting
>>>> * @hw: pointer to the HW struct
>>>> @@ -1055,6 +1135,12 @@ int ice_init_nvm(struct ice_hw *hw)
>>>> return status;
>>>> }
>>>>
>>>> + status = ice_determine_css_hdr_len(hw);
>>>> + if (status) {
>>>> + ice_debug(hw, ICE_DBG_NVM, "Failed to determine Shadow RAM copy
>>>> offsets.\n");
>>>
>>> As this is a new failure path, should the user be warned about this, if
>>> it cannot be determined, and NVM possibly be broken?
>
>> Possibly. I'm also trying to avoid spamming the log with failure
>> messages which are more useful for developers who can enable them vs end
>> users who may not understand.
>
> I agree that logging too much is also confusing. Excuse my ignorance,
> but what happens if NVM is broken and the offset cannot be determined.
> Will the user get any error message and know what to do (replace the
> device or call support)? Or another view point, is the bug report with
> the Linux log messages included going to have the information, so the
> support folks or developers can pinpoint the problem without replying to
> the user to enable debug messages?
>
I'm not sure. In this case it probably does make sense to warn, since
this is unexpected and is unlikely to result in spamming the log
multiple times.
>
> Kind regards,
>
> Paul