On Aug 30, 2013, at 8:59 AM, "Kinney, Michael D" <michael.d.kin...@intel.com>
wrote:
> Eugene,
>
> It seems a bit odd to move an optimization like this into the higher levels
> of the I/O stacks. This looks like a system architecture issue that could
> affect other I/O subsystems other than just networking.
>
> From the thread, it appears the performance issue is in CopyMem(). The EDK
> II was designed with the Lib Class/Lib Instance split so platforms could
> provide different lib instances for different types of optimization and
> different MODULE_TYPEs. Have you considered an alternate BaseMemoryLib
> instance that could be linked against the network drivers?
>
I think he is using the optimized CopyMem(), but maybe there is a different
optimization that can be made. But then again maybe doing 16-bit copies vs.
8-bit copies is not that much of a speed up? I'd guess you would have to ask
some ARM performance expert. It seems the fast instruction requires 32-bit
alignment on both sides.
It also seems to me that another option is to have a PCD in the driver to use
DMA Bus Master Read/Write vs. Common Buffer might be a big performance win. On
ARM the common buffer is uncached and so the copy is expensive. The
Unmap()/Flush() may be a faster operation.
Thanks,
Andrew Fish
> Thanks,
>
> Mike
>
> From: Cohen, Eugene [mailto:eug...@hp.com]
> Sent: Friday, August 30, 2013 4:22 AM
> To: edk2-devel@lists.sourceforge.net; Andrew Fish
> Subject: Re: [edk2] MNP PaddingSize Question
>
> Siyuan,
>
> I haven’t heard back -- I am considering submitting a patch for #1 -- a PCD
> that selects either “frame aligned” or “payload aligned”. Before I go to the
> effort of creating that patch, I wanted to check with you to see if this
> approach would be acceptable or if you had other ideas.
>
> Eugene
>
> From: Cohen, Eugene
> Sent: Thursday, August 22, 2013 6:59 AM
> To: Andrew Fish; edk2-devel@lists.sourceforge.net
> Cc: edk2-devel@lists.sourceforge.net
> Subject: Re: [edk2] MNP PaddingSize Question
>
> Thanks for the responses Siyuan and Andrew.
>
> I think I understand your explanation -- to get the payload aligned properly
> so higher layers can get the best performance and not necessarily align the
> start of the frame itself. Do you have some data you can share on how much
> improvement aligning the payload has? I would assume network performance in
> UEFI would be limited more by the latency of timer tick polling (since we
> don’t get real interrupts) rather than payload alignment.
>
> DMA double-buffering is not happening. The UEFI network driver we’re using
> (from one of the big networking guys) uses common buffer mappings instead.
> Because of the maturity of the network driver I don’t think it’s reasonable
> to ask the vendor to change their driver’s DMA scheme to use BusMasterRead
> and BusMasterWrite instead of common buffers (it could even be impossible
> because of HW limitations). For our systems which do not support cache
> coherent DMA (ARM) the common buffers must be uncached. The common buffers
> themselves are accessed in an aligned manner but the caller’s (cached) buffer
> is unaligned for the reasons we’re discussion. So this forces a CopyMem from
> an aligned uncached location, to an unaligned cached location. The memory
> copy code must downshift to a byte copy because of this misalignment and we
> get horrible performance (byte accesses to uncached memory regions are the
> worst possible workload). I experimented changing the padding size from 6 to
> 8 and then performance improved significantly since the CopyMem could operate
> efficiently.
>
> So it looks like we have two competing optimizations. As you can imagine, on
> my platform the slow down from the uncached byte copy is far worse than the
> misaligned accesses to the cached IP protocol fields. Is there some way we
> can address both concerns? Here are some options I can think of:
>
> 1. Add some parameter (PCD value) to configure MNP to either optimize
> for aligned payload or aligned frame
> 2. Add the option to double-buffer so the first CopyMem (from uncached
> to cached) is frame-aligned and then do a second CopyMem to a buffer that is
> payload-aligned.
> a. This is really no different than if BusMasterRead/BusMasterWrite
> double-buffering is used, it would just need to be done somewhere above the
> driver, maybe in the SNP driver on top of UNDI. Unfortunately there is no
> DMA Unmap() call in this common buffer case that we can use to add the
> additional CopyMem so it would have to be explicit.
> 3. Analyze the performance benefit of the aligned payload and if it’s
> not significant enough, abandon that approach and just use frame-aligned
> buffers (we need data)
> 4. Extend some protocol interfaces so that higher layers can ask lower
> layers what the required alignment is (like IoAlign in BLOCK_IO). So on our
> platform we would say that frame alignment on 4 bytes is required. Perhaps
> on X64 it would be payload alignment on 4 bytes instead.
>
> 1, 3, and 4 are the best performing options since they avoid the need for an
> additional CopyMem so those would be my preference. #1 has the downside that
> we’re tuning for a particular DMA and driver scheme with a PCD value for a
> hardware-independent service (not the greatest architectural approach). If
> we decide to pursue #4 in the long term it would be helpful to me to do #1 in
> the short term still.
>
> Do you have other options or preferences for which approach is used?
>
> Eugene
>
> From: Andrew Fish [mailto:af...@apple.com]
> Sent: Thursday, August 22, 2013 1:38 AM
> To: edk2-devel@lists.sourceforge.net
> Cc: Cohen, Eugene; edk2-devel@lists.sourceforge.net
> Subject: Re: [edk2] MNP PaddingSize Question
>
>
>
> Sent from my iPhone
>
> On Aug 22, 2013, at 12:15 AM, "Fu, Siyuan" <siyuan...@intel.com> wrote:
>
> Hi, Eugene
>
> The PaddingSize is in order to make the packet data (exclude the media
> header) 4-byte aligned when we tries to receive a packet.
> When MNP driver calls the Snp.Receive() interface, both the media header and
> the data will be placed to the *Buffer*. Use IP packet over Ethernet for
> example, the media header is 14 bytes length (2 * 6 bytes MAC address + 2
> bytes protocol type), then the IP4 header which immediately following the
> media header. The EFI network stack is designed to make the minimum times
> memory copy, so most of the upper layer drivers will operate on this buffer
> directly.
> Thus we have 2 choices,
> (1) If *Buffer* passed to Snp.Receive() is 4-byte aligned, the packet data
> will start at a non-dword aligned address. Since most network protocols are
> designed with alignment consideration, the upper layer protocols, like IP,
> UDP, TCP data items, will also start at a non-dword aligned address. I think
> parse these data on unaligned address will also have performance issue.
> (2) If we make the packet data aligned, the *Buffer* is unaligned, it will
> bring performance issue as your said. Fortunately this unaligned memory copy
> only happen once on each packet (only in SNP or UNDI driver).
> I think that’s why MNP driver tries to align a later part of Ethernet packet.
> And I have tested the PXE boot and TCP download on my side and do not see
> clear differences between them (maybe it’s because my UNDI driver do not use
> DMA?).
>
>
> ARM platforms have to do DMA into uncached buffers. This is why it is so
> important to follow the EFI DMA rules.
>
> Eugene have you tried double buffering the data into a cached buffer? I
> wonder if you have a lot of small misaligned accesses to uncached memory, and
> a single copy to a cached buffer would be less overhead. Or maybe you could
> enable caching on the buffer after DMA completes?
>
>
> Hope my explanation is helpful.
>
> Fu, Siyuan
> From: Cohen, Eugene [mailto:eug...@hp.com]
> Sent: Thursday, August 22, 2013 11:46 AM
> To: edk2-devel@lists.sourceforge.net
> Subject: Re: [edk2] MNP PaddingSize Question
>
> Ruth,
>
> The performance impact is related to unaligned copies to uncached buffers.
> So I suppose any machine that must make use of uncached buffers for DMA
> coherency would have the same slowdown, although I have not had a reason to
> measure this on other platforms.
>
> The code seems strange since for a normal driver (UNDI, SNP) the receive
> buffer address passed down is no longer 4-byte aligned. Apparently this code
> is trying to align a later part of the ethernet packet (the payload, not the
> header) but I can’t think of a reason for this.
>
> Eugene
>
> From: Li, Ruth [mailto:ruth...@intel.com]
> Sent: Wednesday, August 21, 2013 7:55 PM
> To: edk2-devel@lists.sourceforge.net
> Subject: Re: [edk2] MNP PaddingSize Question
>
> Hi Eugene,
>
> Below pieces of code has been there for long time. We need some time to
> evaluate it and see possible impact.
>
> BTW, can I know whether you see the performance impact only over your
> machine? Or generally all machine?
>
> Thanks,
> Ruth
> From: Cohen, Eugene [mailto:eug...@hp.com]
> Sent: Tuesday, August 20, 2013 3:56 AM
> To: edk2-devel@lists.sourceforge.net
> Subject: [edk2] MNP PaddingSize Question
>
> I’ve been tracking down a performance issue and have isolated it to this
> piece of MNP initialization code:
>
> //
> // Make sure the protocol headers immediately following the media header
> // 4-byte aligned, and also preserve additional space for VLAN tag
> //
> MnpDeviceData->PaddingSize = ((4 - SnpMode->MediaHeaderSize) & 0x3) +
> NET_VLAN_TAG_LEN;
>
> On my system this is coming up with ‘6’ (MediaHeaderSize = 0xE) which is
> causing performance issues since some of the memory copies to the resulting
> non-dword aligned addresses are slower. As an experiment I tried bumping
> this number to ‘8’ and things worked well.
>
> This value is used later when NET_BUFs are being allocated:
>
> if (MnpDeviceData->PaddingSize > 0) {
> //
> // Pad padding bytes before the media header
> //
> NetbufAllocSpace (Nbuf, MnpDeviceData->PaddingSize, NET_BUF_TAIL);
> NetbufTrim (Nbuf, MnpDeviceData->PaddingSize, NET_BUF_HEAD);
> }
>
> Can someone explain the purpose of PaddingSize and how that affects the later
> processing of packets? Is this number a minimum value and is ok to be larger?
>
> Thanks,
>
> Eugene
>
> ------------------------------------------------------------------------------
> Introducing Performance Central, a new site from SourceForge and
> AppDynamics. Performance Central is your source for news, insights,
> analysis and resources for efficient Application Performance Management.
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/edk2-devel
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel