Garrett D'Amore wrote:
> Andrew Gallatin wrote:
>> Andrew Gallatin wrote:
>>> Garrett D'Amore wrote:
>>>  > I agree with Dana's sentiments here.
>>>  >
>>>  > I'll add one thing, from my own experience:  in my own experience the
>>>  > pain and suffering of dealing with scatter/gather is not justified on
>>>  > modern systems with ordinary sized ethernet frames.  In fact, 
>>> experience
>>>
>>> You might want to amend this with "on slow networks."  For
>>> 10GbE or faster, (and maybe even 1GbE on old, slow hardware)
>>> s/g is definitely worth it.
>>>
>>> Just this morning I doing some tests on a fairly recent 8-way AMD64
>>> with uperf.  Copying rather than mapping roughly doubles CPU
>>> utilization (20% -> 40%) as reported by vmstat for 8 threads sending
>>> on a 10GbE interface (bandwidth is 10GbE line rate regardless).
>> FWIW, on a T200 with 32 1GHZ CPUS (8 4-thread SMT) using 8 tx/rx queues,
>> and a uperf with 8 threads and using a 1500b MTU I see:
>>
>> LSO off, tx copy off:    3.8Gb/s (30% CPU)
>> LSO off, tx copy on:    4.2Gb/s (32% CPU)
>> LSO on,  tx copy off:    9.1Gb/s (18% CPU)
>> LSO on,  tx copy on:    6.5Gb/s (23% CPU)
>>
>> So, it seems like 10GbE, copying is almost a wash without LSO,
>> and a pessimization with LSO even on sparc.
> 
> Good information.  I confess I'm a bit surprised --  I expected bcopy 
> performance to outstrip the cost of dealing with DMA.

Single threaded bcopy on N1 is pretty slow because
memory bandwidth needs to be fair between all the
cores along with each core running @ 1G.

N2 is supposed to be faster, but I haven't seen by
how much.



MRJ

_______________________________________________
driver-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/driver-discuss

Reply via email to