Darren Reed wrote:
> On 26/07/10 12:57 PM, James Carlson wrote:
>> Sequentially, though, it's much easier to get there.
>>
>> And things like tunnels can get you there much faster.  I'm *SURE* that
>> punchin has rolled over that 2^16 limit many times over.
> 
> I'd hazard a guess that punchin gets rebooted for upgrading
> before that happens. To give that more context, Dan did not
> seem to think that fixing the IKE daemon's naive use of the
> index from the routing message was more important than a P4,
> which is a likely indication that the index has not passed 65535.

I think that misses the point I was making.

Creating and destroying a tunnel requires allocating a new ifIndex
number every time (unless it's the "same" tunnel and you can save all
the interface counters in the process).  Even a short run of a punchin
server sees a lot of plumbing and unplumbing as users log in and out.
That's a real-world usage of the system that results in many
plumbs/unplumbs, and that thus consumes a lot of the ifIndex space, with
wrap-arounds being a likely result if the space is artificially constrained.

I wasn't talking about whether the IKE daemon could or should be fixed
to do something different than what it currently does.

>>> Can you imagine how long "ifconfig -a" would take to run on a
>>> system with that many network interfaces?
>>
>> The numbers are intentionally like PIDs; they're not reused until the
>> worst happens.  I believe that's the point you may be missing here.
> 
> One of the ideas that I floated around before addressing this
> was to have DEBUG kernels start their IP interface index
> allocation at 100,000, since we do something similar for PIDs
> but nobody was interested in that.

I don't see how that'd be helpful, but, sure, you could do that.

> But to take that further, except for PIDs under 100, the system
> does reuse the PID number space, so why shouldn't it reuse
> network interface IDs?

It does.  The important point I was making here (and that seems to have
been lost) is that it does _minimal_ reuse, and it does so *on purpose*.

In other words, you have to run through the whole sequence of available
numbers before we ever attempt to reassign an old one.  It doesn't just
assign the "first available" index, but instead assigns the "next
available" one.

Just as is true with PIDs, this is done on purpose.  Minimal reuse means
that even if you have a "stale" (long since unplumbed) index number,
it's very unlikely that it has been reassigned, and thus your
application's mistaken ioctl()s or table look-ups based on it will just
fail rather than returning the wrong interface.

Note that there's also a minor performance argument to be made here:
until the internal "next interface index" counter rolls over, there's no
need at all to check whether the number we're assigning is already used.
 It can't be.  So, as a performance enhancement, we don't check until
roll-over has occurred at least once.  Now, with the change that was
just integrated, this point of lower performance (where the next
available index must be checked against the list of in-use IDs) occurs
earlier and more often.

>> Indeed.  That's exactly why this interface has been left alone for
>> decades, and instead the fixes were put into the applications using the
>> interfaces.
> 
> If I quickly look at current BSD source code, I find:
> - index allocation using the smallest available index
> - limited to USHORT_MAX

Right.  That's why the Solaris code (up until this change) truncated the
index value down to 16 bits for those interfaces.

The SIOCGLIFINDEX interface, though, is Solaris-specific, so the full 32
bits is (or was once) available there.  Now it's broken.

> Linux uses a signed integer and has its own routing message
> protocol (that seems to not use the index...)

Correct, but we don't yet support the Linux interfaces, so that doesn't
matter.

>> I don't believe that the fix applied for CR 6965774 is really the right
>> idea.  It perhaps makes some sense in a Windows-like environment where
>> you're encouraged (sometimes forcefully) to reboot every few hours or
>> so, but not so much for an OS that runs for a long period of time.
>> Tossing away the 32-bit counter that we put into IP decades ago seems
>> like a step backwards.
> 
> I think that the correct solution is to fix SNMP to not assume
> or assert that an interface index should be unique for the
> entire "uptime" of a host.

How would that be "fixed?"

It's a widely-implemented standard, and the idea that ifIndex numbers
are not recycled is baked into the way the standard operates.  It's not
something that vendors get a choice on.

I don't understand what "fix" means in that context.

> It may also be that a new routing message format could be whipped
> up and spread around so that this and other limitations can be
> addressed. But that will take substantially longer and carries
> with it more peril.

True.  That's why I didn't suggest doing that, and why I didn't do it
myself.

Instead, I worked on fixing applications so that where they were
required to use the BSD interfaces, they were able to cope (as much as
possible) with the brokenness of those old interfaces in the few places
where the old interfaces were actually necessary.

It turns out to be fairly easy to do, so I'm pretty confused why CR
6965774 was implemented at all.  What problem did it actually fix?

> Until one of those to happens, it is a mistake to not limit the
> index allocation to [1,65535] because the only way of "fixing"
> the problems that can occur once 65535 is passed is with a
> reboot. To that end, any minor SNMP inconvenience seems trivial.
> Or to put it differently, the potential for problems posed by
> passing 65535 vastly outweigh the SNMP side of the equation.

I strongly disagree.  This change has broken an important feature in
Solaris -- the 32-bit ifIndex numbers -- and I'd very much like to see
it restored.  The fact that the old BSD interfaces had limitations
doesn't mean that the rest of the system needs to be hobbled as well.

-- 
James Carlson         42.703N 71.076W         <carls...@workingcode.com>
_______________________________________________
networking-discuss mailing list
networking-discuss@opensolaris.org

Reply via email to