Re: [networking-discuss] LRO Implementation.

Francesco DiMambro Wed, 04 Jun 2008 11:22:12 -0700

Hi Garrett
Garrett D'Amore wrote:
> Kacheong Poon wrote:
>   
>> Erik Nordmark wrote:
>>
>>   
>>     
>>> I wasn't just concerned about the complexity in the driver - I am 
>>> concerned about the total system complexity caused by the MDT 
>>> implementation.  The amount of code that needs to know about M_MULTIDATA 
>>> is scary, and in many cases there are different code paths to deal with 
>>> those which makes understanding, supporting, and bug fixing much more 
>>> complex.
>>>     
>>>       
>> I think one reason for the above is that we must be
>> backward compatible, hence we need to keep the good
>> old path forever.  The sad truth is that we will
>> always be limited by the existing mblk construct if
>> we cannot accept different code paths.  Note that I
>> am not promoting multiple code paths.
>>   
>>     
>
> However, MDT is *not* a public API, so if we could ever get the Cassini 
> driver updated to something a bit more modern (like GLDv3!), then we 
> *could* eliminate MDT.  This is one of the reasons I tried to champion 
> the effort to port Cassini to GLDv3.  (And yes, I'm still bitter about 
> the fact that NSN was so 100% totally closed minded about even the 
> *possibility* of entertaining a GLDv3 port.   I still have about 80% of 
> the GLDv3 conversion work done -- its probably another couple of man 
> weeks to finish it, but I don't think it will ever be picked up.)
> Cleaning up MDT is one of the *major* benefits that the effort would, 
> and the *entire* networking stack would benefit. (Recall at the time I 
> was really trying to improve the PPS numbers for Solaris.)
>   
>>> Architecturally it makes more sense to have everything about GLD just 
>>> view everything as TCP LSO. In the case the hardware doesn't handle LSO 
>>> it is quite efficient to convert the LSO format to an "MDT format". By 
>>> this I mean take LSO's 'one TCP/IP header, one large payload' into 
>>> 'multiple TCP/IP headers, separate payloads but on the same pages'. That 
>>> means you'd get the performance benefit of doing DMA/IOMMU setup for the 
>>> single large payload and page with N TCP/IP headers.
>>>     
>>>       
>> As Jim stated, the question is whether we want to do
>> the above given the already known problems.  For example,
>> suppose TCP wants to do better PMTUd and wants to change
>> the segment size on the fly.  In order to recover faster
>> in case PMTU has not changed, it decides to send alternate
>> small and big segments.  I think the above GLD LSO scheme
>> will not allow this easily.  TCP will need to do multiple
>> sends just like today.  And I guess the above GLD LSO
>> scheme still won't solve the issues I gave in my previous
>> email.  So maybe we can just do the simple thing and forget
>> about this GLD LSO thingy.  And just make the code path
>> simple and quick enough.
>>   
>>     
>
> The one thing I'll add, is based on my own analysis, far and away the 
> longest portion of the code paths are actually in the device drivers 
> (from the point the driver's send routine is called).  Simplifying the 
> code in the device drivers is likely to gain the best improvement.  That 
> said, if there were a way to amortize dma setup, teardown, and buffer 
> management (especially to get it outside of the locks used by the 
> driver), I think that it may be worth doing.  TCP could still be doing 
> the header creation, but if it premapped a large segment for the driver, 
> then at least for ordinary size packets, there would likely be a 
> significant win.)
>   
This is true, and what makes the one packet at a time solution and 
evolutionary
dead end, the driver has to be able to service Multiple packets in one 
shot, so
those costs are amortized.
With LSO the people designing the hardware outside Sun, realized this 
need for
multiple packets in one shot so they trumpted the software folks, they 
also did it
to Sun network hardware folks, who had done Cassini and decided to solve
exactly the same problem, but on the Rx side. (Modern name for it is LRO).
But left the Tx side as the simple one packet at a time model.
    We had to do something to redress the balance and secure the future, 
so we
developed MDT which was the Tx way of using the 'multidata_t' data structure
to send Multiple packets in one shot, now it's intent was not simply to 
accelerate
Tx and catch up to LSO (at the time not implemented in Solaris but 
available in
Windows and Linux). We succeeded, and I'm telling you it's still meaningful
today.
Now we didn't stop there 'multidata_t' is a data structure that 
facilitates multiple
packet data movement, so is also meaningful on the Rx side as well. We 
started a
project  MDR which is equivalent to modern LRO. (Ultimately unsuccessfully
because I had to drop the ball on it to go back to Cassini bring-up 
activity,
when I got back to it I lost the support I had, due to company 
re-org.....).
Oh well, I still saw a future for it and wrote up ways to implement a SGE
engine that would accelerate packets delivered from the stack with the
'multidata_t' data structure, and a hardware receive approach that would be
friendly to a stack that supported 'multidata_t' on the rx path...see below.


*7,379,453 *Method and apparatus for transferring multiple packets from 
hardware
*7,356,039 *Method and apparatus for describing multiple packets to hardware

The Tx side is in Neptune but it appears the software folks over there
missed there opportunity to use it.
By the way I measured my driver last night against Linux which uses TOE,
get slow Rx out of the equation, I can get 1.2G from the Niagara system
with a single strand pegged. Nothing fancy just netperf -H ..... -- -S1M 
-s1M.
> The challenge is to find a way to do this, that is not so invasive to 
> the stack.
That's a little more open that, lets EOL it. The problems mentioned so far
are fixable of the energy is spent there.
> I'm not entirely sure how to achieve that.  But MDT as it is 
> implemented today is *not* that way.
>   
If you're made to spend your energy re-writing old drivers to fit in a 
GLDv3 framework
then it's no wonder you're unsure how to achieve it. If that energy was 
spent looking
at solving some of the issues discussed in the implementation of MDT, 
then you would
have a chance. When I left MDT was still in Cassini, and so was MDR 
(compile time
option), it may still be there. I know Cassini is not in ON but it was 
*NEVER* hidden
from anyone either.

    Frank
>     -- Garrett
> _______________________________________________
> networking-discuss mailing list
> [email protected]
>   

_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] LRO Implementation.

Reply via email to