Hi Javier,
I did some more debugging on this issue and found out the following.
I turned on the hwmp debug and observed that when node 1 comes back to 
life (re-authenticate) with the secured mesh,
all the PREQ go through him. In other words PREQs received on node 2 and 
3 from the node 4 (portal) go through node 1.
This is the reason why the paths are all now pointing to node 1, he's 
the only one processing PREQs. (broadcasted traffic)
Now if I turn off node 1, and set static routes for node 2, 3 and 4, it 
works.
So the problem seems to be affecting encrypted broadcasted traffic only, 
not sure why.
Also if I restart the meshd daemon on node 2 and 3 OR node 4 the secured 
mesh recovers.
Is there something specific about the keys used for multicast/broadcast 
traffic?

--Fabrice

On 12/6/2011 2:39 PM, Fabrice Deyber wrote:
> Hi Javier,
> see my responses inline.
>
> On 12/6/2011 1:10 PM, Javier Cardona wrote:
>> Hi Fabrice,
>>
>> Thank you for reporting this problem.  Comments inline.
>>
>> On Mon, Dec 5, 2011 at 11:27 AM, Fabrice Deyber
>> <[email protected]>  wrote:
>>> Hi Javier, Thomas
>>> I 'm running some tests on a secured mesh setup and found an issue. 
>>> My setup
>>> has four nodes. One of the node, node 4,  is used as a mesh portal 
>>> and is
>>> connected to PC. All the nodes (including the portal) send traffic
>>> constantly to the PC (FTP, iperf...).
>>> When I bring the secured mesh up, everything works fine. All the 
>>> nodes are
>>> within reach of each other so the all the paths (routes) point to 
>>> the mesh
>>> portal (one hop).
>>>
>>> node 1<----->
>>> node 2<----->  node 4<====>  PC
>>> node 3<----->
>>>
>>> <=>: Ethernet link
>>> <-->: mesh link
>>>
>>> Now if I bring down one node 1, I can see that I loose traffic for 
>>> that node
>>> (which is expected).
>>>
>>> node 1<--X-->
>>> node 2<----->  node 4<====>  PC
>>> node 3<----->
>>>
>>> If I bring node 1 back into the mesh something strange happens. All the
>>> paths(routes) for all the nodes will point to node 1. All the 
>>> traffic is
>>> restored (even if now we have multi-hop routes). The paths should 
>>> not be
>>> updated this way.
>>>
>>> node 2<----->  node 1<---->  node 4<====>  PC
>>> node 3<----->
>>>
>>> Now if I bring back down the node 1, I loose all the traffic from 
>>> all the
>>> nodes but the portal (node 4 is Ethernet connected to the PC). Node 
>>> 2 and
>>> Node 3 paths still point to node 1.
>>>
>>> node 2<--X-->  node 1<-X-->  node 4<====>  PC
>>> node 3<--X-->
>> Your explanation is very good, but I'm afraid I would need wireshark
>> captures to get to the bottom of this.
>> Unfortunately at this point all I have is questions:
>>
>> 1. Path selection is identical in secure and open mesh networks.  The
>> only difference between the two is how peer links are created.  But
>> once a peer link is created, the paths are discovered in exactly the
>> same way.  Of course in the secure case, if there are mismatched keys,
>> path selection will fail.  But this is not what you are seeing.  How
>> do you bring up/down node 1?  Are you temporarily moving it out of
>> range?  I'm interested in knowing whether node 1 needs to
>> re-authenticate when it re-joins the mesh or if it will continue using
>> the keys from the previous authentication.
> In my case I can bring down node 1 either by shutting it off (power 
> down) or ifconfig down. In either case node 1 will re-authenticate 
> with the other nodes.
> (The meshd daemon is killed and restarted in case I use ifconfig)
> I also see it if I first bring up node 2, 3 and 4. Start transferring 
> data and then turn on node 1. As soon as node 1 is up (authenticate), 
> paths will point to node 1.
>>
>>> If I wait long enough, several minutes, the nodes seem to recover. 
>>> This is a
>>> problem, the mesh should self heal when nodes come and go from the 
>>> mesh.
>>>
>>> This behavior does not happen in an open mesh setup. I'm using
>>> open80211s-0.5.
>> 2. Just to make sure I understand the problem... In the open mesh
>> setup, I understand your network just transitions between the two
>> states below, is that right?
>>
>> # node 1 down
>> node 1<--X-->
>> node 2<----->  node 4<====>  PC
>> node 3<----->
>>
>> # node 1 up
>> node 1<----->
>> node 2<----->  node 4<====>  PC
>> node 3<----->
> Correct.
>> 3. When node 1 is down, what is the state of the peer link between
>> nodes 1 and 4 @node 4?  Is it different in secure vs. open?
> The plink stays established with increasing inactive time (it will 
> eventually time out).
> It's the case for both secured and open mesh.
> When the node 1 plink times out on the secured mesh I see this on node 
> 2 and 3:
>
> iw mesh0 mpath dump
>
> DEST ADDR         NEXT HOP          IFACE       SN      METRIC  
> QLEN    EXPTIME         DTIM    DRET    FLAGS
> 00:1b:b1:88:22:1f 00:00:00:00:00:00 mesh0       0       0       
> 0       3221479132      1600    4       0x0
>
> 00:1b:b1:88:22:1f is the portal (node 4).
> Node 4 does not have any mpath entry.
>>
>> 4. Is this behavior specific to proxying?  In other words, if you
>> replace PC for a mesh node (e.g. node 5) and you force the same
>> topology (by blacklisting nodes 1-3 on node 5 and vice versa), do you
>> see the same problem?
> I don't think so. I ran a quick test without any PC attached to the 
> mesh. I issued a broadcast ping to the mesh from node 4,
> then brought down node 1 and brought it back up. Same thing all the 
> paths pointed to node 1.
>>
>> Cheers,
>>
>> Javier
>>
> Hope this helps.
> I will let you know if I find additional info.
>
> --Fabrice
>
_______________________________________________
Devel mailing list
[email protected]
http://open80211s.com/mailman/listinfo/devel

Reply via email to