On 26/07/10 02:10 PM, James Carlson wrote:
Darren Reed wrote:
On 26/07/10 12:57 PM, James Carlson wrote:
Sequentially, though, it's much easier to get there.
And things like tunnels can get you there much faster. I'm *SURE* that
punchin has rolled over that 2^16 limit many times over.
I'd hazard a guess that punchin gets rebooted for upgrading
before that happens. To give that more context, Dan did not
seem to think that fixing the IKE daemon's naive use of the
index from the routing message was more important than a P4,
which is a likely indication that the index has not passed 65535.
I think that misses the point I was making.
Creating and destroying a tunnel requires allocating a new ifIndex
number every time (unless it's the "same" tunnel and you can save all
the interface counters in the process). Even a short run of a punchin
server sees a lot of plumbing and unplumbing as users log in and out.
That's a real-world usage of the system that results in many
plumbs/unplumbs, and that thus consumes a lot of the ifIndex space, with
wrap-arounds being a likely result if the space is artificially constrained.
And to carry this forward further, if the number space did pass 65535 then
the punchin server would need to be rebooted because the system would then
start to misbehave in unexpected ways.
I wasn't talking about whether the IKE daemon could or should be fixed
to do something different than what it currently does.
It's my understanding that punchin uses IPSec and thus IKE.
Indeed. That's exactly why this interface has been left alone for
decades, and instead the fixes were put into the applications using the
interfaces.
If I quickly look at current BSD source code, I find:
- index allocation using the smallest available index
- limited to USHORT_MAX
Right. That's why the Solaris code (up until this change) truncated the
index value down to 16 bits for those interfaces.
And in truncating the index value down to 16 bits, the old behaviour
caused broken behaviour when the index number passed 65535.
The SIOCGLIFINDEX interface, though, is Solaris-specific, so the full 32
bits is (or was once) available there. Now it's broken.
I disagree.
The interface works perfectly fine, reporting an interface index
that maps to an existing IP interface. Furthermore, there's now
a guarantee that the number returned from SIOCGLIFINDEX will
match up with those received in routing protocol messages.
I don't believe that the fix applied for CR 6965774 is really the right
idea. It perhaps makes some sense in a Windows-like environment where
you're encouraged (sometimes forcefully) to reboot every few hours or
so, but not so much for an OS that runs for a long period of time.
Tossing away the 32-bit counter that we put into IP decades ago seems
like a step backwards.
I think that the correct solution is to fix SNMP to not assume
or assert that an interface index should be unique for the
entire "uptime" of a host.
How would that be "fixed?"
It's a widely-implemented standard, and the idea that ifIndex numbers
are not recycled is baked into the way the standard operates. It's not
something that vendors get a choice on.
I don't understand what "fix" means in that context.
Aside from Linux and derived products, which vendors of a
multitasking operating system operate in accordance with
the RFC in question?
Until one of those to happens, it is a mistake to not limit the
index allocation to [1,65535] because the only way of "fixing"
the problems that can occur once 65535 is passed is with a
reboot. To that end, any minor SNMP inconvenience seems trivial.
Or to put it differently, the potential for problems posed by
passing 65535 vastly outweigh the SNMP side of the equation.
I strongly disagree. This change has broken an important feature in
Solaris -- the 32-bit ifIndex numbers -- and I'd very much like to see
it restored. The fact that the old BSD interfaces had limitations
doesn't mean that the rest of the system needs to be hobbled as well.
How many times have you taken advantage of this and used this feature?
Faced with the choice of either potentially causing problems for
SNMP or forcing users to reboot their systems in order to avoid
further networking behaviour issues, which do you choose?
Maybe a way of putting this is that the new behaviour is a lesser
evil than the old behaviour and that nirvana is yet to be reached.
Finally, given that you feel so strongly about this and given that
this is OpenSolaris, feel free to file a bug in bugzilla along with
a new design and code that fixes this issue. Nothing speaks louder
in an open source community than contributions of new working code.
Darren
_______________________________________________
networking-discuss mailing list
networking-discuss@opensolaris.org