On Wed, 2015-06-10 at 21:57 -0600, Jason Gunthorpe wrote: > On Wed, Jun 10, 2015 at 09:06:28PM -0400, Doug Ledford wrote: > > > People tend to push the "patches should be small, self contained, > > incremental" ideal. In some cases, that gets carried to an extreme. In > > this case, patch 1 introduces one side of the locking and patch 3 and 5 > > introduce the other halves. > > I already did spot check patches 3 and 5 for exactly that. They add > other uses of RCU, but they appear to be totally different - > objectional for style reasons, but probably not incorrect. > > For instance the rcu lock grabs in patch 3 and 5 are protecting the > call to netdev_master_upper_dev_get_rcu in patch 10. 'get_netdev' is > more correctly called 'get_netdev_rcu' in this design. (as I said, this > placement of rcu_read_lock is ugly).
Without going back to review it, I was thinking that patches 3 and 5 were related, but the rcu locking in the remaining patches were related to core netdev rcu locking that is irrespective of what we do in the roce gid table. But, that just underscores my other point: we need the patches that implement all of the relevant rcu locking for the gid table ndev in the same patch. > .. and just searching through the patches for 'rcu' to write this, I > noticed this: > > +void ib_enum_roce_ports_of_netdev(roce_netdev_filter filter, > [..] > + down_read(&lists_rwsem); > + list_for_each_entry_rcu(dev, &device_list, core_list) > + ib_dev_roce_ports_of_netdev(dev, filter, filter_cookie, cb, > + cookie); > + up_read(&lists_rwsem); > > Should't call list_for_each_entry_rcu under a rwsem, this is just left over > from the old locking regime... > > > > I think you've got the right basic idea for a cleanup series here. It > > > is time to buckle down and execute it well. > > > > Except that this isn't really a cleanup, and calling it that clouds the > > issue. > > Well, I've been asking for a cleanup .. The entire goal is to make > things more reviewable and a no-functional-change cleanup would sure > help that.. In this case I'm not sure that's entirely realistic though. Due to the fact that the mlx4 driver and the ocrdma driver had their own gid management code, there were some distinct differences between the two. The gid at index 0 never matched up in my testing for example. One supported bonding, the other didn't. Even if you tried to limit things to a cleanup, you would still end up altering behavior of both drivers just because the cleanup would have to merge the two implementations and the result would be different than either one of them. So I don't think there is any such thing as a no-functional-change cleanup possible here. Now, they could have went minimal, but instead they went "create new implementation that is supposedly done right and standardized, then switch existing drivers over to it" with some dubious rcu locking. > > it. But that's not the case. The new core code implements everything > > that the two drivers do, and then some more. > > I'd be interested to see a list of the 'some more' included in the > patch comments, I didn't look with a fine toothed comb, but not much > functional stood out to me... I get the impression that a lot of the extra is scalability changes for perceived need (probably due to upcoming namespace/container additions). But, just to be fair, the entire mlx4 gid code was -500 lines and included bonding support. The base gid code addition was +500 for the gid table, +600 for the table population functions, +170 for default gids, +290 for bonding support, and +140 lines to hook it into the existing cache routines. While part of this might be would seem to be the over-designed locking, part of it is probably exactly as Matan said, code growth due to generalization and standardization. -- Doug Ledford <[email protected]> GPG KeyID: 0E572FDD
signature.asc
Description: This is a digitally signed message part
