On 12/7/2023 6:53 PM, Tian, Kevin wrote:
>> From: Jason Gunthorpe <[email protected]>
>> Sent: Thursday, December 7, 2023 10:46 PM
>>
>> On Thu, Dec 07, 2023 at 07:55:17AM +0000, Tian, Kevin wrote:
>>>> From: Cao, Yahui <[email protected]>
>>>> Sent: Tuesday, November 21, 2023 10:51 AM
>>>>
>>>> +
>>>> + /* Once RX Queue is enabled, network traffic may come in at
>>>> any
>>>> + * time. As a result, RX Queue head needs to be loaded
>>>> before
>>>> + * RX Queue is enabled.
>>>> + * For simplicity and integration, overwrite RX head just after
>>>> + * RX ring context is configured.
>>>> + */
>>>> + if (msg_slot->opcode == VIRTCHNL_OP_CONFIG_VSI_QUEUES)
>>>> {
>>>> + ret = ice_migration_load_rx_head(vf, devstate);
>>>> + if (ret) {
>>>> + dev_err(dev, "VF %d failed to load rx head\n",
>>>> + vf->vf_id);
>>>> + goto out_clear_replay;
>>>> + }
>>>> + }
>>>> +
>>>
>>> Don't we have the same problem here as for TX head restore that the
>>> vfio migration protocol doesn't carry a way to tell whether the IOAS
>>> associated with the device has been restored then allowing RX DMA
>>> at this point might cause device error?
>>
>> Does this trigger a DMA?
>
> looks yes from the comment
>
>>
>>> @Jason, is it a common gap applying to all devices which include a
>>> receiving path from link? How is it handled in mlx migration
>>> driver?
>>
>> There should be no DMA until the device is placed in RUNNING. All
>> devices may instantly trigger DMA once placed in RUNNING.
>>
>> The VMM must ensure the entire environment is ready to go before
>> putting anything in RUNNING, including having setup the IOMMU.
>>
>
> ah, yes. that is the right behavior.
>
> so if there is no other way to block DMA before RUNNING is reached,
> here the RX queue should be left disabled until when transitioning
> to RUNNING.
>
> Yahui, can you double check?
I think this will require us to ensure that the Rx queue restoration
happens later. I'm looking at refactoring the approach to use an
internal representation of VF state instead of a series of virtchnl
messages, and this should allow us to avoid enabling the Rx queue until
after we're ready for DMA.