On 08/10/17 08:23, Yotam Gigi wrote:
> On 10/05/2017 03:09 PM, Nikolay Aleksandrov wrote:
>> On 05/10/17 13:36, Jiri Pirko wrote:
>>> From: Yotam Gigi <yot...@mellanox.com>
>>>
>>> Every bridge port is in one of four mcast router port states:
>>>  - MDB_RTR_TYPE_PERM - the port is set by the user to be an mrouter port
>>>    regardless of IGMP queries.
>>>  - MDB_RTR_TYPE_DISABLED - the port is set by the user to not be an mrouter
>>>    port regardless of IGMP queries.
>>>  - MDB_RTR_TYPE_TEMP - the port is set by the user to be in mcast router
>>>    learning state, but currently it is not an mrouter port as no IGMP query
>>>    has been received by it for the last multicast_querier_interval.
>>>  - MDB_RTR_TYPE_TEMP_QUERY - the port is set by the user to be in mcast
>>>    router learning state, and currently it is an mrouter port due to an
>>>    IGMP query that has been received by it during the passed
>>>    multicast_querier_interval.
>> I think you got the last two partially mixed up, MDB_RTR_TYPE_TEMP marks the 
>> port as a router
>> regardless if there were any igmp queries, while TYPE_TEMP_QUERY means it's 
>> in learning
>> state. It is the timer (armed vs not) that defines if currently the port is 
>> a router
>> when one of the TEMP/TEMP_QUERY are set. In the _TEMP case it is always 
>> armed as it
>> is refreshed by user or igmp queries which was the point of that mode.
>> So this means in br_multicast_router() just check for the timer_pending or 
>> perm mode.
> 
> 
> As much as I tried to make this clear, it seems like I failed :)
> 
> The 4 states I described are currently the "bridged port" states, not the
> "bridge device" state. A bridged port has these 4 states, all can be set by 
> the
> user, while the bridge device only uses 3 of these states. This patch makes 
> the
> bridge device use the 4 states too. I thought it makes sense.

(disclaimer: this is all about bridge ports, not bridge device)
Right, I'll try to explain again: _TEMP always marks the port as a mcast router,
it does not put it into just learning state waiting for an igmp query and it can
be refreshed by either a query or the user again setting the port in _TEMP.
While _TEMP_QUERY puts the port in learning state waiting for a query to become
a router, and _TEMP downgrades to _TEMP_QUERY if it expires.

Does that make it clearer ?

so for _TEMP you say:
>>>  - MDB_RTR_TYPE_TEMP - the port is set by the user to be in mcast router
>>>    learning state, but currently it is not an mrouter port as no IGMP query
>>>    has been received by it for the last multicast_querier_interval.

which is not the case, it is always a router when that mode is set on a port.
Same for _TEMP_QUERY.

> 
> The first paragraph describes the current states of a bridged port, and the
> second one explains the difference between bridged port and bridge device. I
> will (try to) make it clearer if we agree on resending this patch.
> 
> Is it clearer now?
> 
> 
>>
>> In the port code you have the following transitions:
>>  _TEMP -> TEMP_QUERY (on timer fire or user-set val, port becomes learning 
>> only)
>>  _TEMP -> _TEMP (noop on user refresh or igmp query, timer refreshes)
>>  _TEMP_QUERY -> _TEMP_QUERY (on igmp query the timer is armed, port becomes 
>> router)
>>
>> you never have _TEMP_QUERY -> _TEMP, which you're using here to denote the 
>> timer
>> getting armed and the bridge becoming a router.
> 
> 
> I am not sure I got this one. I do address that: when an IGMP query is 
> recieved
> and the current state is _TEMP_QUERY, I arm the timer and set the state to
> _TEMP. I marked that place on the patch, so you can see below.
> 

Exactly, there is no such transition for the ports. I tried to say that you're 
using
the router type to distinguish between when a query is received and it is just 
learning.
I get that you need to do so, but that deviates from how ports are handled, 
thus I
suggested to use the timer state instead and drop the _TEMP for bridge device 
altogether.
If it's possible then the patch will be much simpler and you will not need the 
hacks
to hide the state from user-space which is the part I really don't like.

> 
>>
>>> The bridge device (brX) itself can also be configured by the user to be
>>> either fixed, disabled or learning mrouter port states, but currently there
>>> is no distinction between the MDB_RTR_TYPE_TEMP_QUERY and MDB_RTR_TYPE_TEMP
>>> in the bridge internal state. Due to that, when an IGMP query is received,
>>> it is not straightforward to tell whether it changes the bridge device
>>> mrouter port status or not.
>> But before this patch the bridge device could not get that set.
>>
>>> Further patches in this patch-set will introduce notifications upon the
>>> bridge device mrouter port state. In order to prevent resending bridge
>>> mrouter notification when it is not needed, such distinction is necessary.
>>>
>> Granted the bridge device hasn't got a way to clearly distinguish the 
>> transitions
>> without the chance for a race and if using the timer one could get an 
>> unnecessary
> 
> Is there a race condition?
> 

With this state - no, if you try to use the timer state though there might be
an extra notification as I've explained below, that's why I asked if it would be
a problem. I think it is worth looking into using the timer state because it 
will
remove a lot of the hacks for hiding the state needed in this patch.

>> notification but that seems like a corner case when the timer fires exactly 
>> at the
>> same time as the igmp query is received. Can't it be handled by just 
>> checking if
>> the new state is different in the notification receiver ?
>> If it can't and is a problem then I'd prefer to add a new boolean to denote 
>> that
>> router on/off transition rather than doing this.
>>
>>> Hence, add the distinction between MDB_RTR_TYPE_TEMP and
>>> MDB_RTR_TYPE_TEMP_QUERY states for the bridge device, similarly to any
>>> other bridge port.
>>>
>> This does not add proper MDB_RTR_TYPE_TEMP support for the bridge device
>> but seems to abuse it to distinguish the timer state, and changes
>> the meaning of MDB_RTR_TYPE_TEMP. Can't you just use the timer instead ?
>> I think it will simplify the set and avoid all of this.
>>
>>> In order to not break the current kernel-user API, don't propagate the new
>>> state to the user and use it only in the bridge internal state. Thus, if
>>> the user reads (either via sysfs or netlink) the bridge device mrouter
>>> state, he will get the MDB_RTR_TYPE_TEMP_QUERY state even if the current
>>> bridge state is MDB_RTR_TYPE_TEMP.
>>>
>>> Signed-off-by: Yotam Gigi <yot...@mellanox.com>
>>> Reviewed-by: Nogah Frankel <nog...@mellanox.com>
>>> Signed-off-by: Jiri Pirko <j...@mellanox.com>
>>> ---
>>>  net/bridge/br_multicast.c | 25 +++++++++++++++++++++----
>>>  net/bridge/br_netlink.c   |  3 ++-
>>>  net/bridge/br_private.h   | 13 ++++++++++---
>>>  net/bridge/br_sysfs_br.c  |  3 ++-
>>>  4 files changed, 35 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
>>> index 8dc5c8d..b86307b 100644
>>> --- a/net/bridge/br_multicast.c
>>> +++ b/net/bridge/br_multicast.c
>>> @@ -861,6 +861,17 @@ static void br_multicast_router_expired(unsigned long 
>>> data)
>>>  
>>>  static void br_multicast_local_router_expired(unsigned long data)
>>>  {
>>> +   struct net_bridge *br = (struct net_bridge *) data;
>>> +
>>> +   spin_lock(&br->multicast_lock);
>>> +   if (br->multicast_router == MDB_RTR_TYPE_DISABLED ||
>>> +       br->multicast_router == MDB_RTR_TYPE_PERM ||
>>> +       timer_pending(&br->multicast_router_timer))
>>> +           goto out;
>>> +
>>> +   br->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
>>> +out:
>>> +   spin_unlock(&br->multicast_lock);
>>>  }
>>>  
>>>  static void br_multicast_querier_expired(struct net_bridge *br,
>>> @@ -1364,9 +1375,12 @@ static void br_multicast_mark_router(struct 
>>> net_bridge *br,
>>>     unsigned long now = jiffies;
>>>  
>>>     if (!port) {
>>> -           if (br->multicast_router == MDB_RTR_TYPE_TEMP_QUERY)
>>> +           if (br->multicast_router == MDB_RTR_TYPE_TEMP_QUERY ||
>>> +               br->multicast_router == MDB_RTR_TYPE_TEMP) {
>>>                     mod_timer(&br->multicast_router_timer,
>>>                               now + br->multicast_querier_interval);
>>> +                   br->multicast_router = MDB_RTR_TYPE_TEMP;
>>> +           }
> 
> 
> These are the transitions:
> _TEMP -> _TEMP_QUERY
> _TEMP_QUERY -> _TEMP_QUERY

You clearly always set multicast_router to _TEMP, where did you see a transition
_TEMP_QUERY -> _TEMP_QUERY or _TEMP -> _TEMP_QUERY ?
All transitions are -> _TEMP, because you use it to differentiate between 
learning
state and when a query was received. Maybe we have different definition for 
a transition :-)

> 
> In both transitions the timer is armed.
> 
> 
>>>             return;
>>>     }
>>>  
>>> @@ -1952,7 +1966,7 @@ void br_multicast_init(struct net_bridge *br)
>>>  
>>>     spin_lock_init(&br->multicast_lock);
>>>     setup_timer(&br->multicast_router_timer,
>>> -               br_multicast_local_router_expired, 0);
>>> +               br_multicast_local_router_expired, (unsigned long)br);
>>>     setup_timer(&br->ip4_other_query.timer,
>>>                 br_ip4_multicast_querier_expired, (unsigned long)br);
>>>     setup_timer(&br->ip4_own_query.timer, br_ip4_multicast_query_expired,
>>> @@ -2043,11 +2057,14 @@ int br_multicast_set_router(struct net_bridge *br, 
>>> unsigned long val)
>>>     case MDB_RTR_TYPE_DISABLED:
>>>     case MDB_RTR_TYPE_PERM:
>>>             del_timer(&br->multicast_router_timer);
>>> -           /* fall through */
>>> -   case MDB_RTR_TYPE_TEMP_QUERY:
>>>             br->multicast_router = val;
>>>             err = 0;
>>>             break;
>>> +   case MDB_RTR_TYPE_TEMP_QUERY:
>>> +           if (br->multicast_router != MDB_RTR_TYPE_TEMP)
>>> +                   br->multicast_router = val;
>>> +           err = 0;
>>> +           break;
>>>     }
>>>  
>>>     spin_unlock_bh(&br->multicast_lock);
>>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>>> index dea88a2..cee5016 100644
>>> --- a/net/bridge/br_netlink.c
>>> +++ b/net/bridge/br_netlink.c
>>> @@ -1357,7 +1357,8 @@ static int br_fill_info(struct sk_buff *skb, const 
>>> struct net_device *brdev)
>>>             return -EMSGSIZE;
>>>  #endif
>>>  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>>> -   if (nla_put_u8(skb, IFLA_BR_MCAST_ROUTER, br->multicast_router) ||
>>> +   if (nla_put_u8(skb, IFLA_BR_MCAST_ROUTER,
>>> +                  br_multicast_router_translate(br->multicast_router)) ||
>>>         nla_put_u8(skb, IFLA_BR_MCAST_SNOOPING, !br->multicast_disabled) ||
>>>         nla_put_u8(skb, IFLA_BR_MCAST_QUERY_USE_IFADDR,
>>>                    br->multicast_query_use_ifaddr) ||
>>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>>> index ab4df24..e6e3fec 100644
>>> --- a/net/bridge/br_private.h
>>> +++ b/net/bridge/br_private.h
>>> @@ -649,9 +649,8 @@ void br_multicast_get_stats(const struct net_bridge *br,
>>>  
>>>  static inline bool br_multicast_is_router(struct net_bridge *br)
>>>  {
>>> -   return br->multicast_router == 2 ||
>>> -          (br->multicast_router == 1 &&
>>> -           timer_pending(&br->multicast_router_timer));
>>> +   return br->multicast_router == MDB_RTR_TYPE_PERM ||
>>> +          br->multicast_router == MDB_RTR_TYPE_TEMP;
>>>  }
>>>  
>>>  static inline bool
>>> @@ -790,6 +789,14 @@ static inline int br_multicast_igmp_type(const struct 
>>> sk_buff *skb)
>>>  }
>>>  #endif
>>>  
>>> +static inline unsigned char
>> u8
>>
>>> +br_multicast_router_translate(unsigned char multicast_router)
>> u8, if need be change the type of the struct member
> 
> 
> Sorry, didn't quite get that. Currently, both the bridge_port and bridge have
> the multicast_router field, and in both cases it is of type unsigned char. Do
> you suggest to change both to be "u8"?
> 

Right, and this is unnecessarily long and requires to be broken ugly like that 
with
the type above and name below. So I suggested to use u8 to avoid that.

> 
>>
>>> +{
>>> +   if (multicast_router == MDB_RTR_TYPE_TEMP)
>>> +           return MDB_RTR_TYPE_TEMP_QUERY;
>>> +   return multicast_router;
>>> +}
>>> +
>>>  /* br_vlan.c */
>>>  #ifdef CONFIG_BRIDGE_VLAN_FILTERING
>>>  bool br_allowed_ingress(const struct net_bridge *br,
>>> diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
>>> index 723f25e..9b9c597 100644
>>> --- a/net/bridge/br_sysfs_br.c
>>> +++ b/net/bridge/br_sysfs_br.c
>>> @@ -340,7 +340,8 @@ static ssize_t multicast_router_show(struct device *d,
>>>                                  struct device_attribute *attr, char *buf)
>>>  {
>>>     struct net_bridge *br = to_bridge(d);
>>> -   return sprintf(buf, "%d\n", br->multicast_router);
>>> +   return sprintf(buf, "%d\n",
>>> +                  br_multicast_router_translate(br->multicast_router));
>>>  }
>>>  
>>>  static ssize_t multicast_router_store(struct device *d,
>>>
> 

Reply via email to