On Wed, 9 Jan 2008, John Watlington wrote: > On Jan 9, 2008, at 6:50 PM, Mikus Grinbergs wrote: > >>> Just switch off the Legacy IP, as we should have >>> done months ago, and get on with making things work properly. >>> Anything else is a distraction. >> >> I sympathize with how overworked OLPC developers are. But a number >> of G1G1 systems are getting into the hands of articulate net-aware >> people. If they become disenchanted by the Legacy IP performance of >> the OLPC, what they say might result in hurting the whole project. > > You misunderstood our local IPv6 evangelist, he wasn't proposing to > disable > IPv4 on the laptop, just not to support it on the school server > mesh. Given that > all mesh capable devices will support IPv6, he's probably got a point. > > > Here is my take-home summary of this thread: > > Short term solution is to turn off IPv6 on the mesh, and tell kids > that if their > network performance degrades, they should "click on the circle again" > which will trigger an IPv4 DHCP discovery of the nearest MPP. > > Long term solution is probably to move to IPv6 only, using a user space > agent to decide which RAs to listen to. This user space agent can > implement > Javier's suggestion to avoid flapping between MPPs. Mobile IPv6 would > be frosting on the cake, but doesn't help with the primary problem of > MPP > selection.
I'm trying to make sure I fully understand the problem it sounds as if you have a good mechanism in the mesh for the laptops to send packets to the nearest MPP the problem is that if they get an IP address from a MPP that is a long way away (either initially due to a problem or over time as the laptop moves more hops away from the MPP) the fact that reply packets will always go the the MPP that gave out the IP address (due to normal IP routing) results in a slow reply as these packets start taking longer to get from the MPP to the laptop. is this correct so far? this problem is further complicated by the IPv6 equivalent of DHCP makeing it more likely that the initial 'registration' with a MPP is less optimal. and to top things off, since the replies are typically larger then the requests (which is why people live with DSL that is only 512Kb outbound, but is 1.5Mb inbound) the additional delays on the inbound leg are significantly worse. I am makeing the assumption that the MPP machines know the wireless topology from each of their points of view i.e. not only do they know how to get to the wireless nodes from themselves (and how many hops away they are), but they also know this information for each of the other MPP nodes. If this assumption is not true currently, a daemon would need to be run to keep the MPP boxes in agreement over who is the best gateway to the laptop. If I am on track so far let me see if I can divide the resulting problem into three cases 1. the MPP boxes involved are 'owned' by seperate entites and may not know about each other over the wired network. 2. the MPP boxes involved are associated with each other, but may be two or more network hops away from each other, but all managed as part of the same set (with egress filters configured so that outbound traffic could come from any of the MPP boxes) 3. the MPP boxes involved with the mesh network are tightly coupled (all connected with a high-speed wire network on the same subnet (no routing between them, all on the same broadcase domain) addressing these one at a time. for #1 I can't think of any reasonable way to move a machine from talking to one MPP to another short of true mobile IP solutions. for #2 the basic approach is the same as LVS uses in tunneling mode see http://www.linuxvirtualserver.org/VS-IPTunneling.html for a diagram and explination This is basicly what I was suggesting earlier, don't worry about the outbound traffic, just bounce the inbound traffic to the closest node (via a tunnel) before sending it over the air. this chould be a matter of useing the existing LVS code and changing the server selection logic with something that is aware of the wireless topology. to avoid a routing loop where the packet gets bounced back and forth between MPP boxes, you should be able to set things up so that the load balancing is only done on packets coming in from the outside (I don't know if iptables can do this stock, but it should be a simple, if ugly hack to make packets arriving through a tunnel bypass the LVS code and get inserted just past it in the IP stack) the worst case with this model should be that some inbound packets get relayed to the wrong MPP and make more hops then they need to over the air. for #3 I am looking at other server load balancing options, specificly the clusterIP target available in iptables http://flaviostechnotalk.com/wordpress/index.php/2005/06/12/loadbalancer-less-clusters-on-linux what this does is to define an IP address that exists on all machine and uses a multicast MAC address, this forces the switch to send the packet to every port of the switch. The systems then run a match on the incoming packet to decide if they should deal with it or ignore it (in the existing code a hash on sourceip, sourceip-sourceport, or sourceip-sourceport-destport). if this instead did a lookup to the mesh information to decide if this was the closest node or not, and if it is go ahead and route it over to the wireless, if not drop it. note that there is a race condition where one node may decide it's not the closest before another decides that it is. If node moves are infrequent this may not require further attention (TCP packets will get resent if they get dropped), if they are too frequent the retries will cause too many delays and the race would need to be narrowed or eliminated. the race condition where two nodes both decide that they are closest (one getting added before the other removes the entry) is not nearly the same problem as all it would result in is an extra copy of the packet being sent over the air (which does eat up bandwidth, but should not cause other problems) both #2 and #3 above only work really well if there is no NAT taking place on the MPP boxes (NAT can take place between the MPP boxes and the Internet, as long as the MPP boxes can pretend it's not there) If NAT is running on the MPP box then when a node migrates from one to the other the state of any connections would need to migrate as well. This is the same problem that is faced by a HA pair of firewalls that don't want to loose connections when they fail over, and there are tools to deal with this, see http://people.netfilter.org/pablo/conntrack-tools/ This is not a nice approach, and with the dependancy on userspace to replicate the data it is prone to gaps in coverage as data is mibrated around, but if moves are infrequent enough this could be acceptable. in some ways #2 is the nicest as there is only one copy of the packet around, but the need to setup the tunnels and the more extensive configureation neededare drawbacks, #3 is simple, but is far more likely to end up sending extra copies of packets over the air, and definantly will impact switch performance (as it effectivly turns your switch into a hub) both of them involve (relativly) simple changes to existing kernel code, taking a chunk of code that's makeing a decision and replacing it with code that looks at the mesh info instead. I have sucessfully avoided useing IPv6 to this point ;-) but I don't know of any reason why these strategies shouldn't work with it just as well as IPv4. Unfortuantly I don't know anything about the mesh code to begin trying to code this myself. since this only needs to look at the destination IP address and look it up in the mesh table I would take a stab at it if I did understand where to find the mesh data. My guess is that it would be less work for someone who understands the mesh data to try this than it would be to educate me on the mesh data as you are running against a deadline (and the fact that I have to fly from LA to Atlanta this weekend won't help matters any) David Lang _______________________________________________ Devel mailing list [email protected] http://lists.laptop.org/listinfo/devel
