Re: [PATCH 2/25]: Avoid accumulation of large send credit

Gerrit Renker Sun, 15 Apr 2007 08:44:57 -0700

Hi Eddie,

this email is confused and angry so before even starting with the facts can I 
just
apologize for having asked you not to send any offline emails. That was probably
a bad thing to do, sorry.


With that out of the way, can we please have a cooler view at the facts.


|  Gerrit.  I know the implementation is broken for high rates.  But you are 
|  saying that it is impossible to implement CCID3 congestion control at high 
|  rates.  I am not convinced.  Among other things, CCID3's t_gran section 
gives 
|  the implementation EXACTLY the flexibility required to smoothly transition 
|  from a purely rate-based, packet-at-a-time sending algorithm to a hybrid 
|  algorithm where periodic bursts provide a rate that is on average X.
|  
|  Your examples repeatedly demonstrate that the current implementation is 
|  broken.  Cool.
Unfortunately it is, and I say this without any glee. It was before I started 
work
on it, and for that matter probably even before Ian converted the code.

The problem is that, due to the slow-start mechanism, the sender will always try
to ramp up to link speed and thus invariably to such small packet spacings that 
it
can not control.

I didn't say CCID3 is impossible, and I didn't say that your specification was 
bad.

What I mean to say is that trying to implement the algorithm "exactly and 
explicitly"
out of the book does not work: on the one hand it ignores the realities of the 
operating 
system (scheduling granularity, processing costs, inaccuracies, delays) and on 
the other
hand it ignores realities of networking - as per David's and Ian's answers. 

So the point is merely that the goals of TFRC need to somehow be rephrased in 
terms of
what can be done sensibly. 

I think that there is a lot of very valuable points to be learned from David's 
input
and that listening and not listening carefully to such hints can make the key 
difference
as to whether or not CCID3 works in the real world too.

I really hope that the points raised at the end of last week will somehow be 
linked with
the TFRC/CCID3 specification.


|  If you were to just say this was an interim fix it would be easier, but I'd 
|  still be confused, since fixing tihs issue does not seem hard.  Just limit 
the 
|  accumulated send credit to something greater than 0, such as the RTT.  But 
you 
|  hate that for some reason that you are not articulating.
No sorry, it is not a quick fix. I think it requires some rethinking.  

|  It's here that you go off the rails:
|  
|   > Seriously, I think that Linux or any other scheduler-based OS is simply 
the
|   > wrong platform for CCID3, this here can not give you the precision and the
|   > controllability that your specification assumes and requires.
|  
|  The specification neither assumes nor requires this and in fact has an 
|  EXPLICIT section that EXACTLY addresses this problem, 4.6 of 3448.
I think that this is `exactly' the problem - we can not meet the goals of that
specification by implementing it `explicitly'. The explicit requirements 
constrain
the implementation, which itself is constrained by the realities of what can be 
implemented, and what works in a real network. 

By relaxing that explicitness, you would give implementers the freedom to meet 
the 
goals of your specification. And you would win tremendously from that - 
especially
when using the input from David or Arnaldo.


 
|   > CCID2 works nicely since it does not have all these precision 
requirements.
|  
|  To put it mildly, you have not provided evidence that CCID3 does either.
Oh I did several months ago, all posted to the list and mentioned several times.
The links are
        http://www.erg.abdn.ac.uk/users/gerrit/dccp/docs/packet_scheduling/
        
http://www.erg.abdn.ac.uk/users/gerrit/dccp/docs/impact_of_tx_queue_lenghts/

  
|  Ian: do you want to collaborate on a patch for this?
A patch for a conceptual problem? Please do.


Thanks.


  
|  Gerrit Renker wrote:
|  > Your arguments consider only the specification. What you don't see, and 
Ian also doesn't seem
|  > to see, is that this implementation conforms to the ideas of TFRC only up 
to a maximum speed
|  > of s * HZ bytes per second; under benign conditions this is about 12..15 
Mbits/sec.
|  > 
|  > Once you are past that speed you effectively have a `raw socket' module 
whose only resemblance 
|  > to TFRC/DCCP is the package format; without even a hint of congestion 
control.
|  > 
|  > Here for instance is typical output, copied & pasted just a minute ago:
|  > 
|  > $ iperf -sd -t20
|  > ------------------------------------------------------------
|  > Server listening on DCCP port 5001
|  > DCCP datagram buffer size:   106 KByte (default)
|  > ------------------------------------------------------------
|  > [  4] local 192.235.214.65 port 5001 connected with 192.235.214.75 port 
40524
|  > [  4]  0.0-20.4 sec  1.08 GBytes    454 Mbits/sec                          
    
|  > 
|  > If you ask the above sender to reduce its speed to 200 Mbits/sec in 
response to network congestion
|  > reported via ECN receiver or feedback it will _not_ do that - simply 
because it is unable to control
|  > those speeds. It will continue to send at maximum speed (up to 80% link 
bandwidth is possible).
|  > 
|  > Only when you ask it to reduce below s * HZ will it be able to slow down, 
which here would mean
|  > to reduce from 454 Mbits/sec to 12 Mbits/sec.
|  > 
|  > That said, without this patch you will get a stampede of packets for the 
other reason that
|  > the scheduler is not as precise as required; it will always add up the lag 
arising from
|  > interpreting e.g. 1.7 as 1 and 0.9 as 0 milliseconds. I still would like 
this patch in for exactly
|  > these reasons.
|  > 
|  > Seriously, I think that Linux or any other scheduler-based OS is simply 
the wrong platform for CCID3, 
|  > this here can not give you the precision and the controllability that your 
specification assumes
|  > and requires. 
|  > 
|  > You are aware of Ian's (and I doubt whether he is the only one) aversions 
against high-res timers.
|  > This would remove these silly accumulations and remove the need for 
patches such as this one.
|  > 
|  > The other case is the use of interface timestamps. With interface 
timestamps, I was able to accurately
|  > sample the link RTT as it is reported e.g. by ping. With the present 
layer-4 timestamps, this goes up
|  > back again to very high values, simply because the inaccuracies add all 
up. 
|  > 
|  > Conversely, it very much seems that the specification needs some revision 
before it becomes implementable
|  > on a non-realtime OS. Can you give us something which we can implement 
with the constraints we have
|  > (i.e. no interface timestamps, no high-res timers, accumulation of 
inaccuracies)?
|  > 
|  > CCID2 works nicely since it does not have all these precision requirements.
|  > 
|  > 
|  > 
|  > 
|  > 
|  > Quoting Eddie Kohler:
|  > |  > That is one of the problems here - in the RFC such problems do not 
arise, but the implementation needs
|  > |  > to address these correctly.
|  > |  
|  > |  The RFC's solution to this problem, which involves t_gran, EXACTLY 
addresses this
|  > |  
|  > |  > |  Your token bucket math, incidentally, is wrong.  The easiest way 
to see this 
|  > |  > |  is to note that, according to your math, ANY token bucket filter 
attempting to 
|  > |  > |  limit the average output rate would have to have n = 0, making 
TBFs useless. 
|  > |  > |  The critical error is in assuming that a TBF allows "s * (n + R * 
rho)" bytes 
|  > |  > |  to be sent in a period R.  This is not right; a TBF allows a 
maximum of s * R 
|  > |  > |  * rho per longer-term period R; that's the point.  A token bucket 
filter 
|  > |  > |  allows only SHORT-term bursts to compensate for earlier slow 
periods.  Which 
|  > |  > |  is exactly what we need.
|  > |  > Please take another look. The formula is correct (you will find the 
same one e.g in Andrew
|  > |  > Tanenbaum's book). 
|  > |  
|  > |  So I assume what you are referring to is the clause "average rate OVER 
ONE 
|  > |  RTT"?  Sorry I missed that.  I missed it because it is not TFRC's goal. 
 Can 
|  > |  you point to the section in RFC3448 or RFC4342 that prohibits a TFRC 
sender 
|  > |  from ever sending a (transient) rate more than X over one RTT?  RFC3448 
4.6 
|  > |  allows burstiness much more than a single packet, and the intro allows 
|  > |  fluctuations of up to a factor of 2 relative to the fair rate
|  > |  
|  > |  > I think (with regard to the paragraph below) that your perspective is 
an entirely different one,
|  > |  > namely to solve the question "which kind of token bucket do we need 
to obtain an a rate which
|  > |  > is on average consistent with X". 
|  > |  
|  > |  That is *TFRC's* perspective: finding packet sends that on average are 
|  > |  consistent with X.  As demonstrated by 4.6 and elsewhere
|  > |  
|  > |  How much above X may an application transiently send?  The intro would 
argue 2x.
|  > |  
|  > |  > But until this is truly resolved I want this patch in.
|  > |  
|  > |  Fine, I disagree, Ian disagrees (as far as I read its messages).  You 
are 
|  > |  fixing one problem and creating another: artificially low send rates
|  > |  
|  > |  Eddie
|  > |  -
|  > |  To unsubscribe from this list: send the line "unsubscribe dccp" in
|  > |  the body of a message to [EMAIL PROTECTED]
|  > |  More majordomo info at  http://vger.kernel.org/majordomo-info.html
|  > |  
|  > |  
|  -
|  To unsubscribe from this list: send the line "unsubscribe dccp" in
|  the body of a message to [EMAIL PROTECTED]
|  More majordomo info at  http://vger.kernel.org/majordomo-info.html
|  
|  
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/25]: Avoid accumulation of large send credit

Reply via email to