Darren Reed wrote: > On 26/07/10 12:57 PM, James Carlson wrote: >> Sequentially, though, it's much easier to get there. >> >> And things like tunnels can get you there much faster. I'm *SURE* that >> punchin has rolled over that 2^16 limit many times over. > > I'd hazard a guess that punchin gets rebooted for upgrading > before that happens. To give that more context, Dan did not > seem to think that fixing the IKE daemon's naive use of the > index from the routing message was more important than a P4, > which is a likely indication that the index has not passed 65535.
I think that misses the point I was making. Creating and destroying a tunnel requires allocating a new ifIndex number every time (unless it's the "same" tunnel and you can save all the interface counters in the process). Even a short run of a punchin server sees a lot of plumbing and unplumbing as users log in and out. That's a real-world usage of the system that results in many plumbs/unplumbs, and that thus consumes a lot of the ifIndex space, with wrap-arounds being a likely result if the space is artificially constrained. I wasn't talking about whether the IKE daemon could or should be fixed to do something different than what it currently does. >>> Can you imagine how long "ifconfig -a" would take to run on a >>> system with that many network interfaces? >> >> The numbers are intentionally like PIDs; they're not reused until the >> worst happens. I believe that's the point you may be missing here. > > One of the ideas that I floated around before addressing this > was to have DEBUG kernels start their IP interface index > allocation at 100,000, since we do something similar for PIDs > but nobody was interested in that. I don't see how that'd be helpful, but, sure, you could do that. > But to take that further, except for PIDs under 100, the system > does reuse the PID number space, so why shouldn't it reuse > network interface IDs? It does. The important point I was making here (and that seems to have been lost) is that it does _minimal_ reuse, and it does so *on purpose*. In other words, you have to run through the whole sequence of available numbers before we ever attempt to reassign an old one. It doesn't just assign the "first available" index, but instead assigns the "next available" one. Just as is true with PIDs, this is done on purpose. Minimal reuse means that even if you have a "stale" (long since unplumbed) index number, it's very unlikely that it has been reassigned, and thus your application's mistaken ioctl()s or table look-ups based on it will just fail rather than returning the wrong interface. Note that there's also a minor performance argument to be made here: until the internal "next interface index" counter rolls over, there's no need at all to check whether the number we're assigning is already used. It can't be. So, as a performance enhancement, we don't check until roll-over has occurred at least once. Now, with the change that was just integrated, this point of lower performance (where the next available index must be checked against the list of in-use IDs) occurs earlier and more often. >> Indeed. That's exactly why this interface has been left alone for >> decades, and instead the fixes were put into the applications using the >> interfaces. > > If I quickly look at current BSD source code, I find: > - index allocation using the smallest available index > - limited to USHORT_MAX Right. That's why the Solaris code (up until this change) truncated the index value down to 16 bits for those interfaces. The SIOCGLIFINDEX interface, though, is Solaris-specific, so the full 32 bits is (or was once) available there. Now it's broken. > Linux uses a signed integer and has its own routing message > protocol (that seems to not use the index...) Correct, but we don't yet support the Linux interfaces, so that doesn't matter. >> I don't believe that the fix applied for CR 6965774 is really the right >> idea. It perhaps makes some sense in a Windows-like environment where >> you're encouraged (sometimes forcefully) to reboot every few hours or >> so, but not so much for an OS that runs for a long period of time. >> Tossing away the 32-bit counter that we put into IP decades ago seems >> like a step backwards. > > I think that the correct solution is to fix SNMP to not assume > or assert that an interface index should be unique for the > entire "uptime" of a host. How would that be "fixed?" It's a widely-implemented standard, and the idea that ifIndex numbers are not recycled is baked into the way the standard operates. It's not something that vendors get a choice on. I don't understand what "fix" means in that context. > It may also be that a new routing message format could be whipped > up and spread around so that this and other limitations can be > addressed. But that will take substantially longer and carries > with it more peril. True. That's why I didn't suggest doing that, and why I didn't do it myself. Instead, I worked on fixing applications so that where they were required to use the BSD interfaces, they were able to cope (as much as possible) with the brokenness of those old interfaces in the few places where the old interfaces were actually necessary. It turns out to be fairly easy to do, so I'm pretty confused why CR 6965774 was implemented at all. What problem did it actually fix? > Until one of those to happens, it is a mistake to not limit the > index allocation to [1,65535] because the only way of "fixing" > the problems that can occur once 65535 is passed is with a > reboot. To that end, any minor SNMP inconvenience seems trivial. > Or to put it differently, the potential for problems posed by > passing 65535 vastly outweigh the SNMP side of the equation. I strongly disagree. This change has broken an important feature in Solaris -- the 32-bit ifIndex numbers -- and I'd very much like to see it restored. The fact that the old BSD interfaces had limitations doesn't mean that the rest of the system needs to be hobbled as well. -- James Carlson 42.703N 71.076W <carls...@workingcode.com> _______________________________________________ networking-discuss mailing list networking-discuss@opensolaris.org