Addressing buffer size when designing hardware, it would be an economic
discussion illuminated by data from the last model of my device that had
a good AQM solution.  With AQM in place, buffers would only overflow
when things got seriously bursty, and some data* from the devices in the
field would tell me if I wanted to go to the next larger memory chip.

--dave
[* suitably anonymized]

On 01/23/2014 08:59 AM, David Collier-Brown wrote:
> [Inline]
>
> On 01/23/2014 01:18 AM, Dave Taht wrote:
>> On Thu, Jan 23, 2014 at 1:15 AM, Dave Taht <[email protected]> wrote:
>>> On Thu, Jan 23, 2014 at 1:10 AM, Fred Baker (fred) <[email protected]> wrote:
>>>> No, you're not blowing smoke. I'm not sure I would compare the behavior to
>>>> PMTUD, as in that the endpoint is given a magic number and manages to it,
>>>> where in this case, it is given the results of its behavior, and it manages
>>>> to improve that.
>>>>
>>>> But this is what I have rambled on about in threads relating to the size of
>>>> a buffer. Folks would really like to have a magic way to calculate the
>>>> buffer size (an amount of memory) they need to install in a router or
>>>> switch, and it isn't that easy, because it has a lot to do with where in 
>>>> the
>>>> network a system is located and how it is used by the applications that use
>>>> it. But AQM, in the end, isn't about buffer size. It is about buffer
>>>> occupancy. In the ideal case, if there are N sessions active on the
>>>> bottleneck link in a path, we would like each to obtain 1/N of the
>>>> bottleneck's capacity, which is to say that it should be able to maximize
>>>> its throughput, while keeping an average of zero packets standing in the
>>>> queue (minimizing both latency and variation in latency). If you know your
>>>> math, you know that the ideal goal isn't actually achievable. But that
>>>> doesn't stop us from trying to asymptotically approach it.
>>> I prefer to think of the goal as to keep a minimum of 1 packet in the queue,
>>> not as an average of 0.
>> And that's not strictly true either. In the case of wifi, and other bundling
>> technologies like those used in cable, you want to keep a minimum of a "good"
>> aggregate of packets in the queue for that technology.
>
> From my pointy queue-head, occupancy is something that averages over
> time, and has a good value if something less than 1.  However, that's
> not necessarily one packet.
>
> In  cases where packets are spaced out in time, an average of
> 0.<something> packets is what I expect.  Let's say 0.7 , arbitrarily.
> When they come in trains, "bullet trains", or more likely, "thundering
> herds", then 0.7 trains are what I expect to see.
>
> The size of trains can be measured, but I expect the real thing we
> care about is whether the queue drains back to zero during the time
> period you're using for your measurements.  If it doesn't, you're
> about to be in a world of hurt (:-))  It's only the physical size
> limitations of a buffer that forces an eventual drop and congestion
> control to happen in bad algorithms.  In the queuing world, we'd
> expect the queue to grow "without bound", and the latency to try for
> infinity.
>
> <start going off-topic>
> Sun used to measure queue occupancy as a Riemann sum, which allowed
> one to graph it like this:
>
> 1.0       
>            ||
>    ___   ||
>   |    |   ||
>   |    |   ||
>   |    |   ||
> _|    |__||__
> 0.0
>
> The first one is 0.66 for 5 time units, the second 1.0 for one unit.
> Both fall to zero, so the system catches up. They have different
> effects on other traffic, which is why I find them interesting: as CPU
> queues, the first hurts interactivity more than the second.
>
> I *think* this will be of interest to us, but probably later.  I've
> pasted in a chunk of the kstat.h comment at the very bottom for later
> reference...
> </off-topic>
>
> Because occupancy can be done in units of time, we don't have to know
> buffer and packet *sizes*
>
>   * A well-performing queue/buffer system will have an average queue
>     occupancy of < 1
>   * A buffer is of sufficient size if the queue occupancy regularly
>     falls to zero repeatedly during the time period. Another way of
>     saying the same thing is that  the queue empties.
>   * An operable queue management system sends congestion notifications
>     if the queue/buffer does not empty
>   * A good queue management system sends congestion notifications in
>     time to keep the queue/buffer emptying regularly.
>
> The parameter here is the time period, and has to do with the maximum
> rate at which packets (work) can arrive. In other words, it's a
> parameter based on the speed of the link.
>
> --dave
> [By the way, there are lots of better queue-heads than I.  The serious
> mathies have way better intuition about what you're seeing that I do]
>
>>>> On Jan 17, 2014, at 3:51 PM, David Collier-Brown <[email protected]> 
>>>> wrote:
>>>>
>>>> I've been reading through the internet-drafts, and one paragraph struck me
>>>> as very illuminating.  This is therefor a sanity-check before I go full-hog
>>>> down a particular path...
>>>>
>>>> The comment is from Baker and Fairhurst,
>>>> https://datatracker.ietf.org/doc/draft-ietf-aqm-recommendation/ and the
>>>> paragraph is [emphases added]
>>>>
>>>> The point of buffering in the network is to absorb data bursts and to
>>>> transmit them during the (hopefully) ensuing bursts of silence. This is
>>>> essential to permit the transmission of bursty data. Normally small queues
>>>> are preferred in network devices, with sufficient queue capacity to absorb
>>>> the bursts. The counter-intuitive result is that maintaining normally-small
>>>> queues can result in higher throughput as well as lower end-to- end delay.
>>>> In summary, queue limits should not reflect the steady state queues we want
>>>> to be maintained in the network; instead, they should reflect the size of
>>>> bursts that a network device needs to absorb.
>>>>
>>>> All of a sudden we're talking about the kinds of queues I know a little
>>>> about (:-))
>>>> ---
>>>>
>>>> I'm going to suggest that these are queues and associated physical buffers
>>>> that do two things:
>>>>
>>>> hold packets that arrive at a bottleneck for a long as it takes to send 
>>>> them
>>>> out a slower link that they came in on, and
>>>> hold bursts of packets that arrive adjacent to each other until they can be
>>>> sent out in a normal spacing, with some small amount of time between them
>>>>
>>>>
>>>> In an illustration of Dave Taht's, the first looks something like this
>>>>
>>>> -------------------------+
>>>>     |X|            |X|      +-------------------
>>>>     |X|            |X|         |XXX|    |XXX|
>>>>     |X|            |X|      +-------------------
>>>> -------------------------+
>>>>
>>>> At the choke-point there is a buffer at least big enough to give the packet
>>>> a chance to wheel from line into column (:-)) and start down the smaller
>>>> pipe.
>>>>
>>>> The speed at which the acks come back,  the frequency of drops, and any
>>>> explicit congestion notifications slows the sender until they don't 
>>>> overload
>>>> the skinnier pipe, thus spacing the packets in the fatter pipe out.
>>>>
>>>> Various causes [Leland] can slow or speed the packets in the fat pipe,
>>>> making it possible for several to arrive adjacent to each other, followed 
>>>> by
>>>> a gap.  The second purpose of a buffer is to hold  these bursts while 
>>>> things
>>>> space themselves back out.
>>>>
>>>> They need to be big enough at minimum to do the speed matching,  and at
>>>> maximum,  big enough to spread a burst back into a normal progression,
>>>> always assuming that acks, drops and explicit congestion notifications are
>>>> slowing the sender to the speed of the slowest part of the network.
>>>> ---
>>>>
>>>> If I'm right about this, we can draw some helpful conclusions
>>>>
>>>> buffer sizes can be set based on measurements:
>>>>
>>>> speed differences, which are pretty static, plus
>>>> observed burstyness
>>>>
>>>> drops and ECN can be done to match the slowest speed in the path
>>>>
>>>> The latter suddenly sounds a bit like path MTU discovery, except it's a bit
>>>> more dynamic, and varies with both path and what's happening in various
>>>> parts of it.
>>>>
>>>> To me, as a capacity/performance nerd, this sounds a lot more familiar and
>>>> manageable. My question to you, before I start madly scribbling on the
>>>> internet drafts is:
>>>>
>>>>     Am I blowing smoke?
>>>>
>>>> --dave
>>>>
>>>> --
>>>> David Collier-Brown,         | Always do right. This will gratify
>>>> System Programmer and Author | some people and astonish the rest
>>>> [email protected]           |                      -- Mark Twain
>>>> (416) 223-8968
>>>>
>>>> _______________________________________________
>>>>
>>>> aqm mailing list
>>>> [email protected]
>>>> https://www.ietf.org/mailman/listinfo/aqm
>>>>
>>>>
>>>>
>>>
>>> --
>>> Dave Täht
>>>
>>> Fixing bufferbloat with cerowrt: 
>>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>
>        /*
>          * Accumulated time and queue length statistics.
>          *
>          * Accumulated time statistics are kept as a running sum
>          * of "active" time.  Queue length statistics are kept as a
>          * running sum of the product of queue length and elapsed time
>          * at that length -- i.e., a Riemann sum for queue length
>          * integrated against time.  (You can also think of the active time
>          * as a Riemann sum, for the boolean function (queue_length > 0)
>          * integrated against time, or you can think of it as the
>          * Lebesgue measure of the set on which queue_length > 0.)
>          *
>          *              ^
>          *              |                       _________
>          *              8                       | i4    |
>          *              |                       |       |
>          *      Queue   6                       |       |
>          *      Length  |       _________       |       |
>          *              4       | i2    |_______|       |
>          *              |       |           i3          |
>          *              2_______|                       |
>          *              |    i1                         |
>          *              |_______________________________|
>          *              Time->  t1      t2      t3      t4
>          * At each change of state (entry or exit from the queue),
>          * we add the elapsed time (since the previous state change)
>          * to the active time if the queue length was non-zero during
>          * that interval; and we add the product of the elapsed time
>          * times the queue length to the running length*time sum.
>          *
>          * This method is generalizable to measuring residency
>          * in any defined system: instead of queue lengths, think
>          * of "outstanding RPC calls to server X".
>          *
>          * A large number of I/O subsystems have at least two basic
>          * "lists" of transactions they manage: one for transactions
>          * that have been accepted for processing but for which processing
>          * has yet to begin, and one for transactions which are actively
>          * being processed (but not done). For this reason, two cumulative
>          * time statistics are defined here: wait (pre-service) time,
>          * and run (service) time.
>          *
>          * All times are 64-bit nanoseconds (hrtime_t), as returned by
>          * gethrtime().
>          *
>          * The units of cumulative busy time are accumulated nanoseconds.
>          * The units of cumulative length*time products are elapsed time
>          * times queue length.        
>          */
>
>
> The ascii art is an excerpt from /usr/include/sys/kstat.h on Solaris 10
>
>
> -- 
> David Collier-Brown,         | Always do right. This will gratify
> System Programmer and Author | some people and astonish the rest
> [email protected]           |                      -- Mark Twain
> (416) 223-8968
>
>
> _______________________________________________
> aqm mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/aqm


-- 
David Collier-Brown,         | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
[email protected]           |                      -- Mark Twain
(416) 223-8968

_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Reply via email to