Hi Claudio,
Note: Any comment below straying from the technical is intended as
/friendly/ banter. Please don't read it any other way!
On Tue, 7 Mar 2006, Claudio Jeker wrote:
I was notified about this rather long thread by Paul Jakma about
OpenBGPD and I think I have to clarify a few things.
I figured you'd hear of it, and am very interested to hear your side.
Paul is talking a lot about memory requirements and uses the numbers
of a bug report to compare OpenBGPD with quagga.
I did apologise in advance for doing that. Those were the only numbers I
could find with google. I only used the numbers of the accounted-for
usage (which I presumed did not include leaked RAM, given it was so much
less and not too inconsistent either with the total usage reported prior
to soft-reconfig).
Also, the reporter mentioned elsewhere in that thread that their
/normal/ usage, prior to the soft-reconfig integration, was:
"Before the upgrade, I was running at something like 60-80 if I
remember it well."
That's not quite the "25MB" often mentioned in OpenBGPd presentations
for /two/ full feeds. ;) So either:
- the reporter's memory was wrong,
or
- the much-bandied "2 full feeds -> 25MB" figure either:
- doesn't tie in with operational reality,
or
- is derived with feeds which are 'best case'
(E.g.: Both from the same upstream, so the AS_PATHs are exact same)
or
- is simply out of date, e.g. maybe because:
- bigger full feeds these days
- the figure refers to older OpenBGPd and memory needs have changed.
?
FWIW, here's data from a Quagga 0.99-today bgpd (with a couple of small
changes to the definition of a structure, see below) with one full 175k
feed, and another ~25k partial-feed (from a different upstream), 65k
AS_PATH attributes, 100k general attributes total:
VmPeak: 86516 kB
VmSize: 85920 kB
VmRSS: 82084 kB
VmData: 80824 kB
VmStk: 88 kB
VmExe: 648 kB
VmLib: 3676 kB
85MB, which is quite comparable to the usage the "60 to 80MB" the
OpenBGPd user I quoted reported for pre-soft-reconfig, presuming they
weren't mistaken.
The 'zebra' daemon is quite bloated, 61MB. Though, that should stay
constant, as explained in other emails. We'll try fix that at some stage
(it stores nexthop information per-route, which is probably worth 40 to
50% of RAM usage, and mostly utterly redundant).
If I count correctly I get 150MB memory usage and some sessions are up
since more than 8 weeks. Yes, this box has soft-reconfig in disabled
but it would not matter anyway because there is no filtering done.
Very good indeed, but I still don't think it's /incomparable/ to Quagga.
Out of curiosity, what would that figure look like with soft-reconfig
enabled? (An estimate even.). Quagga's soft-reconf overhead is minimal.
(no extra RIB table entries, one 32k struct per BGP path. If no
modifications were made to attributes - ie filter only - no additional
attribute overheads. So about 5MB per 180k full-feed, I think).
Here are figures for a Quagga bgpd that is roughly similar to your
config, they are from the RIPE NCC "RIS" routing-data collector project
(see http://www.ris.ripe.net):
- older Quagga, 0.96.5, but I believe it should still be representative
- just under 100 peers
- All but a few sessions are several weeks old
- More than 5 sessions are up for *many* months
- not forwarding, hence not running zebra
(explains previous line, given the version)
- 8 full ~180k feeds,
- slightly fewer distinct AS_PATH entries though than yours: ~256k
Memory usage is:
VmSize: 187252 kB
VmLck: 0 kB
VmRSS: 183072 kB
VmData: 184488 kB
VmStk: 36 kB
VmExe: 596 kB
VmLib: 1972 kB
187MB, versus the 150MB for OpenBGPd. So there's a 20% difference from
OpenBGPd -> Quagga. That's certainly still comparable at least.
Here's one with:
- order of 150ish peers
- most sessions are up for several weeks
- again, not forwarding, not running zebra
- 13 full ~180k IPv4 feeds
- also 0.96.5
- about 55% more AS_PATH attributes (~400k):
VmSize: 278992 kB
VmLck: 0 kB
VmRSS: 262996 kB
VmData: 276224 kB
VmStk: 40 kB
VmExe: 596 kB
VmLib: 1972 kB
Memory usage is 55% greater, nearly exactly in line with the AS_PATH
attribute difference.
Now, there's actually some really 'low-hanging fruit'; junk that
shouldn't be in our core data structures and wasted padding due to poor
layout. That appears to be good for roughly a ~6% reduction in size on
ILP32 machines, possibly a bit more on LP64, and I'll commit the changes
soon.
That should get us down a /little/ bit more to OpenBGPd memory-usage
levels, maybe within 15%. But still, our current memory usage is *not*
that dreadful, not even compared to OpenBGPd, imho.
Despite this, the presentations ye (OpenBGPd developers) have been
giving at various forums paint a slightly different picture: "Quagga is
bloated" and "OpenBGPd is really memory efficient" "25MB for two full
feeds", "not even half of other implementations" to paraphrase some
things I've read/heard from presentations. E.g., see:
http://ezine.daemonnews.org/200603/openbgpd.html
from Henning's talk at NANOG recently.
I'm not sure those characterisations are /quite/ fair.
They don't seem to be representative even of OpenBGPd, other than
possibly as 'best case' and I think you quite definitely misrepresent
the cases of operational reality (ie soft-reconfig) and of other
implementations, unintentionally or not.
It might be an idea in future to provide details about the
characteristics of the feeds used when you provide memory usage figures
to audiences (distinct AS_PATHs, number of prefixes, distinct
attributes, etc. anything which is a significant influence on memory
usage).
Another urban legend is the coolness of "dynamic route refresh".
OpenBGPD does announce the route refresh capability and supports request
from other peers but we do not have the button to make a refresh request
ourself. There is a simple reason why. Because "route refresh" as in RFC
2918 is totaly useless.
Route-refresh has some shortcomings, it's hardly useless though.
Yes, you have to trust your peer resends its table, but there are many
other things you have to trust your peer to do correctly (including
sending you the original updates in the first place..).
Is there operational experience to back up this "unuseable" claim? My
direct experience with Quagga is that RR is quite useable, and I know of
others who rely solely on RR. Looking at the OpenBGPd code, you seem
quite capable of reliably sending table dumps on RR and presumably could
easily accept them too.
Is it maybe considered useless because other, probably early,
implementation(s) of RR got it wrong?
Now that's great you don't know when the refresh is finished. It is
even worse you don't know if it worked at all in the first place.
So the long stream of UPDATEs for 175k+ prefixes isn't a clue? :)
You can actually tell whether routes you accepted previously have been
resent or not, you need to keep just one bit of information for each
route in the RIB. If you really had to know (not that I'd advocate this
being worthwhile).
Can't detect other routes though, no.
The End-of-RIB marker introduced in the BGP Graceful-Restart proposal
should have bin in RFC 2918 from the beginning.
ACK, it should be split out as a seperate draft and a standalone
capability, not tied in with the GR capability. It could be done
relatively easily. And would be great (I have at least one other
prospective use for it).
I wonder if any other markers might be needed, e.g. I wonder if
Start-of-Rib would be an idea too (otherwise possibly a /very/ long
timer to time-out on not receiving EoR). Needs thought.
Fancy working on this? Figure out exactly what markers would be useful
for Quagga and OpenBGPd, and what semantics they should have? Could be
very useful.
We consider the use of RFC 2918 as a substitution of real inbound soft
reconfiguration as unusable until the named issues have been solved.
Adding a knob to bgpctl to issue a refresh request is not a big issue.
So add it, the users will figure out which form of reconfig best suits
their needs. :)
Users with tight memory-constraints may well prefer route-refresh over
soft-reconfig, despite the flaws.
Thanks for your reply!
regards,
--
Paul Jakma,
Network Approachability, KISS. http://quagga.ireland.sun.com/
Sun Microsystems, Dublin, Ireland. tel: EMEA x19190 / +353 1 819 9190
_______________________________________________
networking-discuss mailing list
[email protected]