Re: [edk2] [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer

Ard Biesheuvel Wed, 07 Nov 2018 07:16:00 -0800

On 7 November 2018 at 16:00, Zeng, Star <[email protected]> wrote:
> On 2018/11/6 22:37, Zeng, Star wrote:
>>
>> On 2018/11/6 17:49, Ard Biesheuvel wrote:
>>>
>>> On 31 October 2018 at 05:38, Zeng, Star <[email protected]> wrote:
>>>>
>>>> Good feedback.
>>>>
>>>> On 2018/10/30 20:50, Leif Lindholm wrote:
>>>>>
>>>>>
>>>>> On Tue, Oct 30, 2018 at 09:39:24AM -0300, Ard Biesheuvel wrote:
>>>>>>
>>>>>>
>>>>>> (add back the list)
>>>>>
>>>>>
>>>>>
>>>>> Oi! Go back on holiday!
>>>>>
>>>>>> On 30 October 2018 at 09:07, Cohen, Eugene <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Has this patch been tested on a system that does not have coherent
>>>>>>> DMA?
>>>>>>>
>>>>>>> It's not clear that this change would actually be faster on a system
>>>>>>> of
>>>>>>> that
>>>>>>> type since using common buffers imply access to uncached memory.
>>>>>>> Depending
>>>>>>> on the access patterns the uncached memory access could be more time
>>>>>>> consuming than cache maintenance operations.
>>>>
>>>>
>>>>
>>>> The change/idea was based on the statement below.
>>>>    ///
>>>>    /// Provides both read and write access to system memory by both the
>>>> processor and a
>>>>    /// bus master. The buffer is coherent from both the processor's and
>>>> the
>>>> bus master's point of view.
>>>>    ///
>>>>    EfiPciIoOperationBusMasterCommonBuffer,
>>>>
>>>> Thanks for raising case about uncached memory access. But after checking
>>>> the
>>>> code, for Intel VTd case
>>>>
>>>> https://github.com/tianocore/edk2/blob/master/IntelSiliconPkg/Feature/VTd/IntelVTdDxe/BmDma.c#L460
>>>> (or no IOMMU case
>>>>
>>>> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1567),
>>>> the common buffer is just normal memory buffer.
>>>> If someone can help do some test/collect some data on a system using
>>>> common
>>>> buffers imply access to uncached memory, that will be great.
>>>>
>>>
>>> OK, so first of all, can anyone explain to me under which
>>> circumstances interrupt transfers are a bottleneck? I'd assume that
>>> anything throughput bound would use bulk endpoints.
>>>
>>> Also, since the Map/Unmap calls are only costly when using an IOMMU,
>>> could we simply revert to the old behavior if mIoMmu == NULL?
>>>
>>>>>>
>>>>>> I haven't had time to look at these patches yet.
>>>>>>
>>>>>> I agree with Eugene's concern: the directional DMA routines are much
>>>>>> more performant on implementations with non-coherent DMA, and so
>>>>>> common buffers should be avoided unless we are dealing with data
>>>>>> structures that are truly shared between the CPU and the device.
>>>>>>
>>>>>> Since this is obviously not the case here, could we please have some
>>>>>> numbers about the performance improvement we are talking about here?
>>>>>> Would it be possible to improve the IOMMU handling code instead?
>>>>
>>>>
>>>>
>>>> We collected the data below on a platform with release image and Intel
>>>> VTd
>>>> enabled.
>>>>
>>>> The image size of EhciDxe or XhciDxe can reduce about 120+ bytes.
>>>>
>>>> EHCI without the patch:
>>>> ==[ Cumulative ]========
>>>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>>>     Name         Count     Duration    Duration    Duration    Duration
>>>>
>>>> -------------------------------------------------------------------------------
>>>> S0000B00D1DF0        446        2150           4           2         963
>>>>
>>>> EHCI with the patch:
>>>> ==[ Cumulative ]========
>>>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>>>     Name         Count     Duration    Duration    Duration    Duration
>>>>
>>>> -------------------------------------------------------------------------------
>>>> S0000B00D1DF0        270         742           2           2          41
>>>>
>>>> XHCI without the patch:
>>>> ==[ Cumulative ]========
>>>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>>>     Name         Count     Duration    Duration    Duration    Duration
>>>>
>>>> -------------------------------------------------------------------------------
>>>> S0000B00D14F0        215         603           2           2          52
>>>>
>>>> XHCI with the patch:
>>>> ==[ Cumulative ]========
>>>> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>>>>     Name         Count     Duration    Duration    Duration    Duration
>>>>
>>>> -------------------------------------------------------------------------------
>>>> S0000B00D14F0         95         294           3           2          52
>>>>
>>>> I believe the performance data really depends on
>>>> 1. How many AsyncInterruptTransfer handlers (the number of USB keyboard
>>>> and/or USB bluetooth keyboard?)
>>>> 2. Data size (for flushing data from PCI controller specific address to
>>>> mapped system memory address *in original code*)
>>>> 3. The performance of IoMmu->SetAttribute (for example, the SetAttribute
>>>> operation on Intel VTd engine caused by the unmap and map for flushing
>>>> data
>>>> *in original code*, the SetAttribute operation on IntelVTd engine will
>>>> involve FlushPageTableMemory, InvalidatePageEntry and etc)
>>>>
>>>
>>> OK, so there is room for improvement here: there is no reason the
>>> IOMMU driver couldn't cache mappings, or do some other optimizations
>>> that would make mapping the same memory repeatedly less costly.
>>
>>
>> The unmap/map with IOMMU will direct to SetAttribute that will
>> disallow/allow DMA memory access. The IOMMU driver is hard to predict the
>> sequence of unmap/map operations. Do you have more detail about the
>> optimizations?
>>
>> Could you take a try with the patch on the platform for the case you and
>> Eugene mentioned?
>>
>> Anyway, I am going to revert the patches (3/4 and 4/4, since 1/4 and 2/4
>> have no functionality impact) since the time point is a little sensitive as
>> it is near edk2-stable201811.
>
>
> I have reverted the patch 3/4 and 4/4 at
> https://github.com/tianocore/edk2/compare/1ed6498...d98fc9a, and we can
> continue the discussion.
>


Thanks Star
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Re: [edk2] [PATCH V3 4/4] MdeModulePkg EhciDxe: Use common buffer for AsyncInterruptTransfer

Reply via email to