Hi Javier, I did some more debugging on this issue and found out the following. I turned on the hwmp debug and observed that when node 1 comes back to life (re-authenticate) with the secured mesh, all the PREQ go through him. In other words PREQs received on node 2 and 3 from the node 4 (portal) go through node 1. This is the reason why the paths are all now pointing to node 1, he's the only one processing PREQs. (broadcasted traffic) Now if I turn off node 1, and set static routes for node 2, 3 and 4, it works. So the problem seems to be affecting encrypted broadcasted traffic only, not sure why. Also if I restart the meshd daemon on node 2 and 3 OR node 4 the secured mesh recovers. Is there something specific about the keys used for multicast/broadcast traffic?
--Fabrice On 12/6/2011 2:39 PM, Fabrice Deyber wrote: > Hi Javier, > see my responses inline. > > On 12/6/2011 1:10 PM, Javier Cardona wrote: >> Hi Fabrice, >> >> Thank you for reporting this problem. Comments inline. >> >> On Mon, Dec 5, 2011 at 11:27 AM, Fabrice Deyber >> <[email protected]> wrote: >>> Hi Javier, Thomas >>> I 'm running some tests on a secured mesh setup and found an issue. >>> My setup >>> has four nodes. One of the node, node 4, is used as a mesh portal >>> and is >>> connected to PC. All the nodes (including the portal) send traffic >>> constantly to the PC (FTP, iperf...). >>> When I bring the secured mesh up, everything works fine. All the >>> nodes are >>> within reach of each other so the all the paths (routes) point to >>> the mesh >>> portal (one hop). >>> >>> node 1<-----> >>> node 2<-----> node 4<====> PC >>> node 3<-----> >>> >>> <=>: Ethernet link >>> <-->: mesh link >>> >>> Now if I bring down one node 1, I can see that I loose traffic for >>> that node >>> (which is expected). >>> >>> node 1<--X--> >>> node 2<-----> node 4<====> PC >>> node 3<-----> >>> >>> If I bring node 1 back into the mesh something strange happens. All the >>> paths(routes) for all the nodes will point to node 1. All the >>> traffic is >>> restored (even if now we have multi-hop routes). The paths should >>> not be >>> updated this way. >>> >>> node 2<-----> node 1<----> node 4<====> PC >>> node 3<-----> >>> >>> Now if I bring back down the node 1, I loose all the traffic from >>> all the >>> nodes but the portal (node 4 is Ethernet connected to the PC). Node >>> 2 and >>> Node 3 paths still point to node 1. >>> >>> node 2<--X--> node 1<-X--> node 4<====> PC >>> node 3<--X--> >> Your explanation is very good, but I'm afraid I would need wireshark >> captures to get to the bottom of this. >> Unfortunately at this point all I have is questions: >> >> 1. Path selection is identical in secure and open mesh networks. The >> only difference between the two is how peer links are created. But >> once a peer link is created, the paths are discovered in exactly the >> same way. Of course in the secure case, if there are mismatched keys, >> path selection will fail. But this is not what you are seeing. How >> do you bring up/down node 1? Are you temporarily moving it out of >> range? I'm interested in knowing whether node 1 needs to >> re-authenticate when it re-joins the mesh or if it will continue using >> the keys from the previous authentication. > In my case I can bring down node 1 either by shutting it off (power > down) or ifconfig down. In either case node 1 will re-authenticate > with the other nodes. > (The meshd daemon is killed and restarted in case I use ifconfig) > I also see it if I first bring up node 2, 3 and 4. Start transferring > data and then turn on node 1. As soon as node 1 is up (authenticate), > paths will point to node 1. >> >>> If I wait long enough, several minutes, the nodes seem to recover. >>> This is a >>> problem, the mesh should self heal when nodes come and go from the >>> mesh. >>> >>> This behavior does not happen in an open mesh setup. I'm using >>> open80211s-0.5. >> 2. Just to make sure I understand the problem... In the open mesh >> setup, I understand your network just transitions between the two >> states below, is that right? >> >> # node 1 down >> node 1<--X--> >> node 2<-----> node 4<====> PC >> node 3<-----> >> >> # node 1 up >> node 1<-----> >> node 2<-----> node 4<====> PC >> node 3<-----> > Correct. >> 3. When node 1 is down, what is the state of the peer link between >> nodes 1 and 4 @node 4? Is it different in secure vs. open? > The plink stays established with increasing inactive time (it will > eventually time out). > It's the case for both secured and open mesh. > When the node 1 plink times out on the secured mesh I see this on node > 2 and 3: > > iw mesh0 mpath dump > > DEST ADDR NEXT HOP IFACE SN METRIC > QLEN EXPTIME DTIM DRET FLAGS > 00:1b:b1:88:22:1f 00:00:00:00:00:00 mesh0 0 0 > 0 3221479132 1600 4 0x0 > > 00:1b:b1:88:22:1f is the portal (node 4). > Node 4 does not have any mpath entry. >> >> 4. Is this behavior specific to proxying? In other words, if you >> replace PC for a mesh node (e.g. node 5) and you force the same >> topology (by blacklisting nodes 1-3 on node 5 and vice versa), do you >> see the same problem? > I don't think so. I ran a quick test without any PC attached to the > mesh. I issued a broadcast ping to the mesh from node 4, > then brought down node 1 and brought it back up. Same thing all the > paths pointed to node 1. >> >> Cheers, >> >> Javier >> > Hope this helps. > I will let you know if I find additional info. > > --Fabrice > _______________________________________________ Devel mailing list [email protected] http://open80211s.com/mailman/listinfo/devel
