On Aug 30, 2013, at 8:59 AM, "Kinney, Michael D" <michael.d.kin...@intel.com> 
wrote:

> Eugene,
>  
> It seems a bit odd to move an optimization like this into the higher levels 
> of the I/O stacks.  This looks like a system architecture issue that could 
> affect other I/O subsystems other than just networking. 
>  
> From the thread, it appears the performance issue is in CopyMem().  The EDK 
> II was designed with the Lib Class/Lib Instance split so platforms could 
> provide different lib instances for different types of optimization and 
> different MODULE_TYPEs.  Have you considered an alternate BaseMemoryLib 
> instance that could be linked against the network drivers?
>  

I think he is using the optimized CopyMem(), but maybe there is a different 
optimization that can be made. But then again maybe doing 16-bit copies vs. 
8-bit copies is not that much of a speed up? I'd guess you would have to ask 
some ARM performance expert. It seems the fast instruction requires 32-bit 
alignment on both sides. 

It also seems to me that another option is to have a PCD in the driver to use 
DMA Bus Master Read/Write vs. Common Buffer might be a big performance win. On 
ARM the common buffer is uncached and so the copy is expensive. The 
Unmap()/Flush() may be a faster operation. 

Thanks,

Andrew Fish

> Thanks,
>  
> Mike
>  
> From: Cohen, Eugene [mailto:eug...@hp.com] 
> Sent: Friday, August 30, 2013 4:22 AM
> To: edk2-devel@lists.sourceforge.net; Andrew Fish
> Subject: Re: [edk2] MNP PaddingSize Question
>  
> Siyuan,
>  
> I haven’t heard back -- I am considering submitting a patch for #1 -- a  PCD 
> that selects either “frame aligned” or “payload aligned”.  Before I go to the 
> effort of creating that patch, I wanted to check with you to see if this 
> approach would be acceptable or if you had other ideas.
>  
> Eugene
>  
> From: Cohen, Eugene 
> Sent: Thursday, August 22, 2013 6:59 AM
> To: Andrew Fish; edk2-devel@lists.sourceforge.net
> Cc: edk2-devel@lists.sourceforge.net
> Subject: Re: [edk2] MNP PaddingSize Question
>  
> Thanks for the responses Siyuan and Andrew. 
>  
> I think I understand your explanation -- to get the payload aligned properly 
> so higher layers can get the best performance and not necessarily align the 
> start of the frame itself.  Do you have some data you can share on how much 
> improvement aligning the payload has?  I would assume network performance in 
> UEFI would be limited more by the latency of timer tick polling (since we 
> don’t get real interrupts) rather than payload alignment.
>  
> DMA double-buffering is not happening.  The UEFI network driver we’re using 
> (from one of the big networking guys) uses common buffer mappings instead.  
> Because of the maturity of the network driver I don’t think it’s reasonable 
> to ask the vendor to change their driver’s DMA scheme to use BusMasterRead 
> and BusMasterWrite instead of common buffers (it could even be impossible 
> because of HW limitations).  For our systems which do not support cache 
> coherent DMA (ARM) the common buffers must be uncached.  The common buffers 
> themselves are accessed in an aligned manner but the caller’s (cached) buffer 
> is unaligned for the reasons we’re discussion.  So this forces a CopyMem from 
> an aligned uncached location, to an unaligned cached location.  The memory 
> copy code must downshift to a byte copy because of this misalignment and we 
> get horrible performance (byte accesses to uncached memory regions are the 
> worst possible workload).  I experimented changing the padding size from 6 to 
> 8 and then performance improved significantly since the CopyMem could operate 
> efficiently.
>  
> So it looks like we have two competing optimizations.  As you can imagine, on 
> my platform the slow down from the uncached byte copy is far worse than the 
> misaligned accesses to the cached IP protocol fields.  Is there some way we 
> can address both concerns?  Here are some options I can think of:
>  
> 1.      Add some parameter (PCD value) to configure MNP to either optimize 
> for aligned payload or aligned frame
> 2.      Add the option to double-buffer so the first CopyMem (from uncached 
> to cached) is frame-aligned and then do a second CopyMem to a buffer that is 
> payload-aligned.
> a.      This is really no different than if BusMasterRead/BusMasterWrite 
> double-buffering is used, it would just need to be done somewhere above the 
> driver, maybe in the SNP driver on top of UNDI.  Unfortunately there is no 
> DMA Unmap() call in this common buffer case that we can use to add the 
> additional CopyMem so it would have to be explicit.
> 3.      Analyze the performance benefit of the aligned payload and if it’s 
> not significant enough, abandon that approach and just use frame-aligned 
> buffers (we need data)
> 4.      Extend some protocol interfaces so that higher layers can ask lower 
> layers what the required alignment is (like IoAlign in BLOCK_IO).  So on our 
> platform we would say that frame alignment on 4 bytes is required.  Perhaps 
> on X64 it would be payload alignment on 4 bytes instead.
>  
> 1, 3, and 4 are the best performing options since they avoid the need for an 
> additional CopyMem so those would be my preference.  #1 has the downside that 
> we’re tuning for a particular DMA and driver scheme with a PCD value for a 
> hardware-independent service (not the greatest architectural approach).  If 
> we decide to pursue #4 in the long term it would be helpful to me to do #1 in 
> the short term still.
>  
> Do you have other options or preferences for which approach is used?
>  
> Eugene
>  
> From: Andrew Fish [mailto:af...@apple.com] 
> Sent: Thursday, August 22, 2013 1:38 AM
> To: edk2-devel@lists.sourceforge.net
> Cc: Cohen, Eugene; edk2-devel@lists.sourceforge.net
> Subject: Re: [edk2] MNP PaddingSize Question
>  
> 
> 
> Sent from my iPhone
> 
> On Aug 22, 2013, at 12:15 AM, "Fu, Siyuan" <siyuan...@intel.com> wrote:
> 
> Hi, Eugene
>  
> The PaddingSize is in order to make the packet data (exclude the media 
> header) 4-byte aligned when we tries to receive a packet.
> When MNP driver calls the Snp.Receive() interface, both the media header and 
> the data will be placed to the *Buffer*. Use IP packet over Ethernet for 
> example, the media header is 14 bytes length (2 * 6 bytes MAC address + 2 
> bytes protocol type), then the IP4 header which immediately following the 
> media header. The EFI network stack is designed to make the minimum times 
> memory copy, so most of the upper layer drivers will operate on this buffer 
> directly.
> Thus we have 2 choices,
> (1)    If *Buffer* passed to Snp.Receive() is 4-byte aligned, the packet data 
> will start at a non-dword aligned address. Since most network protocols are 
> designed with alignment consideration, the upper layer protocols, like IP, 
> UDP, TCP data items, will also start at a non-dword aligned address. I think 
> parse these data on unaligned address will also have performance issue.
> (2)    If we make the packet data aligned, the *Buffer* is unaligned, it will 
> bring performance issue as your said. Fortunately this unaligned memory copy 
> only happen once on each packet (only in SNP or UNDI driver).
> I think that’s why MNP driver tries to align a later part of Ethernet packet. 
> And I have tested the PXE boot and TCP download on my side and do not see 
> clear differences between them (maybe it’s because my UNDI driver do not use 
> DMA?).
>  
>  
> ARM platforms have to do DMA into uncached buffers. This is why it is so 
> important to follow the EFI DMA rules.
>  
> Eugene have you tried double buffering the data into a cached buffer? I 
> wonder if you have a lot of small misaligned accesses to uncached memory, and 
> a single copy to a cached buffer would be less overhead. Or maybe you could 
> enable caching on the buffer after DMA completes?
>  
> 
> Hope my explanation is helpful.
>  
> Fu, Siyuan
> From: Cohen, Eugene [mailto:eug...@hp.com] 
> Sent: Thursday, August 22, 2013 11:46 AM
> To: edk2-devel@lists.sourceforge.net
> Subject: Re: [edk2] MNP PaddingSize Question
>  
> Ruth,
>  
> The performance impact is related to unaligned copies to uncached buffers.  
> So I suppose any machine that must make use of uncached buffers for DMA 
> coherency would have the same slowdown, although I have not had a reason to 
> measure this on other platforms.
>  
> The code seems strange since for a normal driver (UNDI, SNP) the receive 
> buffer address passed down is no longer 4-byte aligned.  Apparently this code 
> is trying to align a later part of the ethernet packet (the payload, not the 
> header) but I can’t think of a reason for this.
>  
> Eugene
>  
> From: Li, Ruth [mailto:ruth...@intel.com] 
> Sent: Wednesday, August 21, 2013 7:55 PM
> To: edk2-devel@lists.sourceforge.net
> Subject: Re: [edk2] MNP PaddingSize Question
>  
> Hi Eugene,
>  
> Below pieces of code has been there for long time. We need some time to 
> evaluate it and see possible impact.
>  
> BTW, can I know whether you see the performance impact only over your 
> machine? Or generally all machine?
>  
> Thanks,
> Ruth
> From: Cohen, Eugene [mailto:eug...@hp.com] 
> Sent: Tuesday, August 20, 2013 3:56 AM
> To: edk2-devel@lists.sourceforge.net
> Subject: [edk2] MNP PaddingSize Question
>  
> I’ve been tracking down a performance issue and have isolated it to this 
> piece of MNP initialization code:
>  
>   //
>   // Make sure the protocol headers immediately following the media header
>   // 4-byte aligned, and also preserve additional space for VLAN tag
>   //
>   MnpDeviceData->PaddingSize = ((4 - SnpMode->MediaHeaderSize) & 0x3) + 
> NET_VLAN_TAG_LEN;
>  
> On my system this is coming up with ‘6’ (MediaHeaderSize = 0xE) which is 
> causing performance issues since some of the memory copies to the resulting 
> non-dword aligned addresses are slower.  As an experiment I tried bumping 
> this number to ‘8’ and things worked well.
>  
> This value is used later when NET_BUFs are being allocated:
>  
>     if (MnpDeviceData->PaddingSize > 0) {
>       //
>       // Pad padding bytes before the media header
>       //
>       NetbufAllocSpace (Nbuf, MnpDeviceData->PaddingSize, NET_BUF_TAIL);
>       NetbufTrim (Nbuf, MnpDeviceData->PaddingSize, NET_BUF_HEAD);
>     }
>  
> Can someone explain the purpose of PaddingSize and how that affects the later 
> processing of packets?  Is this number a minimum value and is ok to be larger?
>  
> Thanks,
>  
> Eugene
>  
> ------------------------------------------------------------------------------
> Introducing Performance Central, a new site from SourceForge and 
> AppDynamics. Performance Central is your source for news, insights, 
> analysis and resources for efficient Application Performance Management. 
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/edk2-devel

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Reply via email to