Addressing buffer size when designing hardware, it would be an economic discussion illuminated by data from the last model of my device that had a good AQM solution. With AQM in place, buffers would only overflow when things got seriously bursty, and some data* from the devices in the field would tell me if I wanted to go to the next larger memory chip.
--dave [* suitably anonymized] On 01/23/2014 08:59 AM, David Collier-Brown wrote: > [Inline] > > On 01/23/2014 01:18 AM, Dave Taht wrote: >> On Thu, Jan 23, 2014 at 1:15 AM, Dave Taht <[email protected]> wrote: >>> On Thu, Jan 23, 2014 at 1:10 AM, Fred Baker (fred) <[email protected]> wrote: >>>> No, you're not blowing smoke. I'm not sure I would compare the behavior to >>>> PMTUD, as in that the endpoint is given a magic number and manages to it, >>>> where in this case, it is given the results of its behavior, and it manages >>>> to improve that. >>>> >>>> But this is what I have rambled on about in threads relating to the size of >>>> a buffer. Folks would really like to have a magic way to calculate the >>>> buffer size (an amount of memory) they need to install in a router or >>>> switch, and it isn't that easy, because it has a lot to do with where in >>>> the >>>> network a system is located and how it is used by the applications that use >>>> it. But AQM, in the end, isn't about buffer size. It is about buffer >>>> occupancy. In the ideal case, if there are N sessions active on the >>>> bottleneck link in a path, we would like each to obtain 1/N of the >>>> bottleneck's capacity, which is to say that it should be able to maximize >>>> its throughput, while keeping an average of zero packets standing in the >>>> queue (minimizing both latency and variation in latency). If you know your >>>> math, you know that the ideal goal isn't actually achievable. But that >>>> doesn't stop us from trying to asymptotically approach it. >>> I prefer to think of the goal as to keep a minimum of 1 packet in the queue, >>> not as an average of 0. >> And that's not strictly true either. In the case of wifi, and other bundling >> technologies like those used in cable, you want to keep a minimum of a "good" >> aggregate of packets in the queue for that technology. > > From my pointy queue-head, occupancy is something that averages over > time, and has a good value if something less than 1. However, that's > not necessarily one packet. > > In cases where packets are spaced out in time, an average of > 0.<something> packets is what I expect. Let's say 0.7 , arbitrarily. > When they come in trains, "bullet trains", or more likely, "thundering > herds", then 0.7 trains are what I expect to see. > > The size of trains can be measured, but I expect the real thing we > care about is whether the queue drains back to zero during the time > period you're using for your measurements. If it doesn't, you're > about to be in a world of hurt (:-)) It's only the physical size > limitations of a buffer that forces an eventual drop and congestion > control to happen in bad algorithms. In the queuing world, we'd > expect the queue to grow "without bound", and the latency to try for > infinity. > > <start going off-topic> > Sun used to measure queue occupancy as a Riemann sum, which allowed > one to graph it like this: > > 1.0 > || > ___ || > | | || > | | || > | | || > _| |__||__ > 0.0 > > The first one is 0.66 for 5 time units, the second 1.0 for one unit. > Both fall to zero, so the system catches up. They have different > effects on other traffic, which is why I find them interesting: as CPU > queues, the first hurts interactivity more than the second. > > I *think* this will be of interest to us, but probably later. I've > pasted in a chunk of the kstat.h comment at the very bottom for later > reference... > </off-topic> > > Because occupancy can be done in units of time, we don't have to know > buffer and packet *sizes* > > * A well-performing queue/buffer system will have an average queue > occupancy of < 1 > * A buffer is of sufficient size if the queue occupancy regularly > falls to zero repeatedly during the time period. Another way of > saying the same thing is that the queue empties. > * An operable queue management system sends congestion notifications > if the queue/buffer does not empty > * A good queue management system sends congestion notifications in > time to keep the queue/buffer emptying regularly. > > The parameter here is the time period, and has to do with the maximum > rate at which packets (work) can arrive. In other words, it's a > parameter based on the speed of the link. > > --dave > [By the way, there are lots of better queue-heads than I. The serious > mathies have way better intuition about what you're seeing that I do] > >>>> On Jan 17, 2014, at 3:51 PM, David Collier-Brown <[email protected]> >>>> wrote: >>>> >>>> I've been reading through the internet-drafts, and one paragraph struck me >>>> as very illuminating. This is therefor a sanity-check before I go full-hog >>>> down a particular path... >>>> >>>> The comment is from Baker and Fairhurst, >>>> https://datatracker.ietf.org/doc/draft-ietf-aqm-recommendation/ and the >>>> paragraph is [emphases added] >>>> >>>> The point of buffering in the network is to absorb data bursts and to >>>> transmit them during the (hopefully) ensuing bursts of silence. This is >>>> essential to permit the transmission of bursty data. Normally small queues >>>> are preferred in network devices, with sufficient queue capacity to absorb >>>> the bursts. The counter-intuitive result is that maintaining normally-small >>>> queues can result in higher throughput as well as lower end-to- end delay. >>>> In summary, queue limits should not reflect the steady state queues we want >>>> to be maintained in the network; instead, they should reflect the size of >>>> bursts that a network device needs to absorb. >>>> >>>> All of a sudden we're talking about the kinds of queues I know a little >>>> about (:-)) >>>> --- >>>> >>>> I'm going to suggest that these are queues and associated physical buffers >>>> that do two things: >>>> >>>> hold packets that arrive at a bottleneck for a long as it takes to send >>>> them >>>> out a slower link that they came in on, and >>>> hold bursts of packets that arrive adjacent to each other until they can be >>>> sent out in a normal spacing, with some small amount of time between them >>>> >>>> >>>> In an illustration of Dave Taht's, the first looks something like this >>>> >>>> -------------------------+ >>>> |X| |X| +------------------- >>>> |X| |X| |XXX| |XXX| >>>> |X| |X| +------------------- >>>> -------------------------+ >>>> >>>> At the choke-point there is a buffer at least big enough to give the packet >>>> a chance to wheel from line into column (:-)) and start down the smaller >>>> pipe. >>>> >>>> The speed at which the acks come back, the frequency of drops, and any >>>> explicit congestion notifications slows the sender until they don't >>>> overload >>>> the skinnier pipe, thus spacing the packets in the fatter pipe out. >>>> >>>> Various causes [Leland] can slow or speed the packets in the fat pipe, >>>> making it possible for several to arrive adjacent to each other, followed >>>> by >>>> a gap. The second purpose of a buffer is to hold these bursts while >>>> things >>>> space themselves back out. >>>> >>>> They need to be big enough at minimum to do the speed matching, and at >>>> maximum, big enough to spread a burst back into a normal progression, >>>> always assuming that acks, drops and explicit congestion notifications are >>>> slowing the sender to the speed of the slowest part of the network. >>>> --- >>>> >>>> If I'm right about this, we can draw some helpful conclusions >>>> >>>> buffer sizes can be set based on measurements: >>>> >>>> speed differences, which are pretty static, plus >>>> observed burstyness >>>> >>>> drops and ECN can be done to match the slowest speed in the path >>>> >>>> The latter suddenly sounds a bit like path MTU discovery, except it's a bit >>>> more dynamic, and varies with both path and what's happening in various >>>> parts of it. >>>> >>>> To me, as a capacity/performance nerd, this sounds a lot more familiar and >>>> manageable. My question to you, before I start madly scribbling on the >>>> internet drafts is: >>>> >>>> Am I blowing smoke? >>>> >>>> --dave >>>> >>>> -- >>>> David Collier-Brown, | Always do right. This will gratify >>>> System Programmer and Author | some people and astonish the rest >>>> [email protected] | -- Mark Twain >>>> (416) 223-8968 >>>> >>>> _______________________________________________ >>>> >>>> aqm mailing list >>>> [email protected] >>>> https://www.ietf.org/mailman/listinfo/aqm >>>> >>>> >>>> >>> >>> -- >>> Dave Täht >>> >>> Fixing bufferbloat with cerowrt: >>> http://www.teklibre.com/cerowrt/subscribe.html >> > > /* > * Accumulated time and queue length statistics. > * > * Accumulated time statistics are kept as a running sum > * of "active" time. Queue length statistics are kept as a > * running sum of the product of queue length and elapsed time > * at that length -- i.e., a Riemann sum for queue length > * integrated against time. (You can also think of the active time > * as a Riemann sum, for the boolean function (queue_length > 0) > * integrated against time, or you can think of it as the > * Lebesgue measure of the set on which queue_length > 0.) > * > * ^ > * | _________ > * 8 | i4 | > * | | | > * Queue 6 | | > * Length | _________ | | > * 4 | i2 |_______| | > * | | i3 | > * 2_______| | > * | i1 | > * |_______________________________| > * Time-> t1 t2 t3 t4 > * At each change of state (entry or exit from the queue), > * we add the elapsed time (since the previous state change) > * to the active time if the queue length was non-zero during > * that interval; and we add the product of the elapsed time > * times the queue length to the running length*time sum. > * > * This method is generalizable to measuring residency > * in any defined system: instead of queue lengths, think > * of "outstanding RPC calls to server X". > * > * A large number of I/O subsystems have at least two basic > * "lists" of transactions they manage: one for transactions > * that have been accepted for processing but for which processing > * has yet to begin, and one for transactions which are actively > * being processed (but not done). For this reason, two cumulative > * time statistics are defined here: wait (pre-service) time, > * and run (service) time. > * > * All times are 64-bit nanoseconds (hrtime_t), as returned by > * gethrtime(). > * > * The units of cumulative busy time are accumulated nanoseconds. > * The units of cumulative length*time products are elapsed time > * times queue length. > */ > > > The ascii art is an excerpt from /usr/include/sys/kstat.h on Solaris 10 > > > -- > David Collier-Brown, | Always do right. This will gratify > System Programmer and Author | some people and astonish the rest > [email protected] | -- Mark Twain > (416) 223-8968 > > > _______________________________________________ > aqm mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/aqm -- David Collier-Brown, | Always do right. This will gratify System Programmer and Author | some people and astonish the rest [email protected] | -- Mark Twain (416) 223-8968
_______________________________________________ aqm mailing list [email protected] https://www.ietf.org/mailman/listinfo/aqm
