Tx we use the bcopy mode if the packet /fragment size is less than 512 bytes and use Direct DMA for other sizes. In the Rx we use bcopy if the packet size is less than 128 bytes and use the preallocated, premapped driver Buffer pool for other packet sizes.
There is significant difference in Tx/Rx Tx is normally 6+ G but Rx is only 2+ G. Do you think using DVMA will help here ?? -Mahesh From: crossbow-discuss-bounces at opensolaris.org [mailto:[email protected]] On Behalf Of Krishna Yenduri Sent: Thursday, March 11, 2010 12:45 AM To: crossbow-discuss at opensolaris.org Subject: [crossbow-discuss] Fwd: Re: [osol-code] GLDv3 NIC driver Performance on sparc -------- Original Message -------- Subject: Re: [osol-code] GLDv3 NIC driver Performance on sparc Date: Wed, 10 Mar 2010 11:10:12 -0800 From: Garrett D'Amore <garrett at damore.org><mailto:garrett at damore.org> To: opensolaris-code at opensolaris.org<mailto:opensolaris-code at opensolaris.org> On 03/10/10 10:48 AM, Mahesh wrote: > Hi all, > > I need some help in debugging the gldv3 driver performance issue on sparc. > The driver has single Tx queue and 4 Rx queues and performs at almost the > line rate(10G) on Sun intel boxes but the same driver performs very badly on > sparc. The code is identical for sparc and intel except swapping involved > since the hardware is little endian . Any idea how to debug this issue ?? > The machine i tried to bench mark is T5440 and i have tried setting > ip_soft_rings_count = 16 on T5440 but result is same . > What is "badly"? Note that T5440 hardware uses individual cores which are probably quite a bit slower than an x86 core. Additionally, there could be resource contention (caches, etc.) due to different Niagra architecture here. Note also that I've been told that "bcopy" performs a bit slower on Niagra than on other SPARC or x86 architectures -- are you using bcopy to copy packet data, or are you using direct DMA? (Also, unless you take care, DMA setup and teardown on SPARC systems -- which use an IOMMU -- is quite expensive. In order to get good performance with direct DMA you really have to use loan up or something like it. Its tricky to get this right.) Some other questions: what size MTU are you using? Are you sure that you're hitting each of your 4 RX rings basically "equally" by using different streams and making sure that traffic from a single stream stays on the same h/w ring? Is there a significant difference between TX and RX performance? - Garrett > > Thanks > Mahesh > _______________________________________________ opensolaris-code mailing list opensolaris-code at opensolaris.org<mailto:opensolaris-code at opensolaris.org> http://mail.opensolaris.org/mailman/listinfo/opensolaris-code -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20100312/76e1cc6e/attachment.html>
