On 07/09/15 18:47, Olivier Martin wrote:
> .. in case the platform does not received the used buffer from the
> device, VirtioFlush() returns EFI_TIMEOUT.
> 
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Olivier Martin <olivier.mar...@arm.com>
> ---
>  OvmfPkg/Include/Library/VirtioLib.h   |  3 +++
>  OvmfPkg/Library/VirtioLib/VirtioLib.c | 16 ++++++++++++++--
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/OvmfPkg/Include/Library/VirtioLib.h 
> b/OvmfPkg/Include/Library/VirtioLib.h
> index 36527a5..62f6811 100644
> --- a/OvmfPkg/Include/Library/VirtioLib.h
> +++ b/OvmfPkg/Include/Library/VirtioLib.h
> @@ -170,6 +170,9 @@ VirtioAppendDesc (
>  
>    @return              Error code from VirtIo->SetQueueNotify() if it fails.
>  
> +  @retval EFI_TIMEOUT  If it did not received the used buffer from the device
> +                       in approximatively less than 10ms
> +
>    @retval EFI_SUCCESS  Otherwise, the host processed all descriptors.
>  
>  **/
> diff --git a/OvmfPkg/Library/VirtioLib/VirtioLib.c 
> b/OvmfPkg/Library/VirtioLib/VirtioLib.c
> index 54cf225..566f596 100644
> --- a/OvmfPkg/Library/VirtioLib/VirtioLib.c
> +++ b/OvmfPkg/Library/VirtioLib/VirtioLib.c
> @@ -3,7 +3,7 @@
>    Utility functions used by virtio device drivers.
>  
>    Copyright (C) 2012, Red Hat, Inc.
> -  Portion of Copyright (C) 2013, ARM Ltd.
> +  Portion of Copyright (C) 2013-2014, ARM Ltd.
>  
>    This program and the accompanying materials are licensed and made available
>    under the terms and conditions of the BSD License which accompanies this
> @@ -252,6 +252,9 @@ VirtioAppendDesc (
>  
>    @return              Error code from VirtIo->SetQueueNotify() if it fails.
>  
> +  @retval EFI_TIMEOUT  If it did not received the used buffer from the device
> +                       in approximatively less than 10ms
> +
>    @retval EFI_SUCCESS  Otherwise, the host processed all descriptors.
>  
>  **/
> @@ -267,6 +270,7 @@ VirtioFlush (
>    UINT16     NextAvailIdx;
>    EFI_STATUS Status;
>    UINTN      PollPeriodUsecs;
> +  UINTN      Timeout;
>  
>    //
>    // virtio-0.9.5, 2.4.1.2 Updating the Available Ring
> @@ -304,16 +308,24 @@ VirtioFlush (
>    // Keep slowing down until we reach a poll period of slightly above 1 ms.
>    //
>    PollPeriodUsecs = 1;
> +  Timeout = 0;
>    MemoryFence();
> -  while (*Ring->Used.Idx != NextAvailIdx) {
> +  while ((*Ring->Used.Idx != NextAvailIdx) && (Timeout < 10)) {
>      gBS->Stall (PollPeriodUsecs); // calls AcpiTimerLib::MicroSecondDelay
>  
>      if (PollPeriodUsecs < 1024) {
>        PollPeriodUsecs *= 2;
> +    } else {
> +      Timeout++;
>      }
>      MemoryFence();
>    }
>  
>    MemoryFence();
> +
> +  if (Timeout == 10) {
> +    return EFI_TIMEOUT;
> +  }
> +
>    return EFI_SUCCESS;
>  }
> 

I think if this ever happens, that's a bug in the host / hypervisor.
Under what conditions are you seeing this as necessary?

VirtioFlush() blocks infinitely on purpose in this loop. The protocol
does not define any circumstances for timeout. The communicating peers
are not connected by a fragile network link, they are connected by host
memory. Whenever you time out, that's an arbitrary decision; it might
even misfire dependent on host load.

If you think this is really necessary, then please introduce an integer
PCD. The default value of the PCD (probably 0) should preserve existing
behavior. The PCD should be then overridden by downstreams (by directly
editing the appropriate DSC file). Alternatively, if you have a specific
host platform in mind that necessitates this change (and for which the
value 10 also seems right), then please introduce a new build flag as
well (in the affected DSC(s)). People who build for that platform can
then hardcode that flag in their build scripts.

There are two users of VirtioFlush(): the virtio-blk and virtio-scsi
drivers. Both check for (and propagate) any errors returned by
VirtioFlush() correctly. Any timeout error would reach the outermost
callers (the respective protocol users) just fine. So those parts of the
tree need not be fixed up, I agree. Still, this patch would work around
a host bug, as far as I understand, and the actual timeout looks
arbitrary, so it should be off by default, and controllable without
modifying the C source.

The commit message (and the PCD's comment) should also identify the host
platform that requires this as precisely as possible.

... Obviously, it is *also* possible that the guest code has some other
bug, elsewhere. (We can never exclude that, unless we prove the code
correct with formal tools.) It seems quite unlikely, but if that's the
case, then I'd prefer to track it down and fix it, instead of papering
it over.

Thanks!
Laszlo

------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Reply via email to