On Mon, Oct 15, 2001 at 11:35:51 -0700, Terry Lambert wrote:
> "Kenneth D. Merry" wrote:
[ ... ]
> > > This is actually very bad: you want to drop packets before you
> > > insert them into the queue, rather than after they are in the
> > > queue.  This is because you want the probability of the drop
> > > (assuming the queue is not maxed out: otherwise, the probabilty
> > > should be 100%) to be proportional to the exponential moving
> > > average of the queue depth, after that depth exceeds a drop
> > > threshold.  In other words, you want to use RED.
> > 
> > Which queue?  The packets are dropped before they get to ether_input().
> The easy answer is "any queue", since what you are becoming
> concerned with is pool retention time: you want to throw
> away packets before a queue overflow condition makes it so
> you are getting more than you can actually process.


[ ...RED... ]

> > > Maybe I'm being harsh in calling it "spam'ming".  It does the
> > > wrong thing, by dropping the oldest unprocessed packets first.
> > > A FIFO drop is absolutely the wrong thing to do in an attack
> > > or overload case, when you want to shed load.  I consider the
> > > packet that is being dropped to have been "spam'med" by the
> > > card replacing it with another packet, rather than dropping
> > > the replacement packet instead.
> > >
> > > The real place for this drop is "before it gets to card memory",
> > > not "after it is in host memory"; Floyd, Jacobsen, Mogul, etc.,
> > > all agree on that.
> > 
> > As I mentioned above, how would you do that without some sort of traffic
> > shaper on the wire?
> The easiest answer is to RED queue in the card firmware.

Ahh, but then the packets are likely going to be in card memory already.
A card with a reasonable amount of cache (e.g. the Tigon II) and onboard
firmware will probably dump the packet into an already-set-up spot in

You'd probably need a special mode of interaction between the firmware and
the packet receiving hardware to tell it when to drop packets.

> > My focus with gigabit ethernet was to get maximal throughput out of a small
> > number of connections.  Dealing with a large number of connections is a
> > different problem, and I'm sure it uncovers lots of nifty bugs.
> 8-).  I guess that you are more interested in intermediate hops
> and site to site VPN, while I'm more interested in connection
> termination (big servers, SSL termination, and single client VPN).

Actually, the specific application I worked on for my former employer was
moving large amounts (many gigabytes) of video at high speed via FTP
between FreeBSD-based video servers.  (And between the video servers and
video editor PCs, and data backup servers.)

It actually worked fairly well, and is in production at a number of TV
stations now. :)

> > > I'd actually prefer to avoid the other DMA; I'd also like
> > > to avoid the packet receipt order change that results from
> > > DMA'ing over the previous contents, in the case that an mbuf
> > > can't be allocated.  I'd rather just let good packets in with
> > > a low (but non-zero) statistical probability, relative to a
> > > slew of bad packets, rather than letting a lot of bad packets
> > > from a persistant attacker push my good data out with the bad.
> > 
> > Easier said than done -- dumping random packets would be difficult with a
> > ring-based structure.  Probably what you'd have to do is have an extra pool
> > of mbufs lying around that would get thrown in at random times when mbufs
> > run out to allow some packets to get through.
> > 
> > The problem is, once you exhaust that pool, you're back to the same old
> > problem if you're completely out of mbufs.
> > 
> > You could probably also start shrinking the number of buffers in the ring,
> > but as I said before, you'd need a method for the kernel to notify the
> > driver that more mbufs are available.
> You'd be better off shrinking the window size across all
> the connections, I think.
> As to difficult to do, I actually have RED queue code, which
> I adapted from the formula in a paper.  I have no problem
> giving that code out.
> The real issue is that the BSD queue macros involved in the
> queues really need to be modified to include an "elements on
> queue" count for the calculation of the moving average.

[ ....  ]

> > > OK, I will rediff and generate context diffs; expect them to
> > > be sent in 24 hours or so from now.
> > 
> > It's been longer than that...
> Sorry; I've been doing a lot this weekend.  I will redo them
> at work today, and resend them tonight... definitely.
> > > > Generally the ICMP response tells you how big the maximum MTU is, so you
> > > > don't have to guess.
> > >
> > > Maybe it's the ICMP response; I still haven't had a chance to
> > > hold Michael down and drag the information out of him.  8-).
> > 
> > Maybe what's the ICMP response?
> The difference between working and not working.

Yes, with TCP, the ICMP response is required in order for the path MTU (or
rather MSS) to be autonegotiated properly.  It'll work without the ICMP
response assuming that the minimum of the MTUs on either end is less than
or equal to the smallest MTU in between.

Otherwise, you have to be able to receive ICMP in order to decrease your

With most other protocols (mostly UDP nowadays), the DF bit isn't set, so
intermediate routers can fragment if necessary.

> > > Cicso boxes detect "black hole" routes; I'd have to read the
> > > white paper, rather than just its abstract, to tell you how,
> > > though...
> > 
> > It depends on what they're trying to do with the information.  If they're
> > just trying to route around a problem, that's one thing.  If they're trying
> > to diagnose MTU problems, that's quite another.
> > 
> > In general, it's going to be pretty easy for routers to detect when a
> > packet exceeds the MTU for one of their interfaces and send back a ICMP
> > packet.
> A "black hole" route doesn't ICMP back, either because some
> idiot has blocked ICMP, or because it's just too dumb...

Kinda hard to figure out anything definitive from a black hole, though.

> > > Not for the user.
> > 
> > Configuring the MTU is a standard part of configuring IP networks.
> > If your users aren't smart enough to do it, you'll pretty much have
> > to default to 1500 bytes for ethernet.
> Or figure out how to negotiate higher...
> > You can let the more clueful users increase the MTU.
> That doesn't improve performance, and so "default configuration"
> benchmarks like "Polygraph" really suffer, as a result.

I suppose so.  But if you're running some standard benchmark that generates
a ton of connections, will a large MTU really matter in that situation

You're probably pulling down a whole lot of chunks of not very large data

In any case, if it's a realistic web benchmark or something similar, most
all of the connections will be from machines with 1500 byte MTUs, or
perhaps all if your "upstream" hardware can only do 1500 byte packets.

> > If you're supplying enough of the equipment, you can make some assumptions
> > about the equipment setup.  This was the case with my former employer -- in
> > many cases we supplied the switch as well as the machines to go onto the
> > network, so we knew ahead of time that jumbo frames would work.  Otherwise,
> > we'd work with the customer to set things up with standard or jumbo frames
> > depending on their network architecture.
> This approach only works if you're Cisco or another big iron
> vendor, in an established niche.


> [ ... more on MTU negotiation for jumbograms ... ]
> > > In any case, Intel cards appear to do it, and so do Tigon III's.
> > 
> > That's nice, but there's no reason a card has to accept packets with a
> > higher MTU than it is configured for.
> True, but then there's no reason I have to buy the cards
> that choose to not do this.  8-).


Kenneth Merry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Reply via email to