> On 2021/4/17 20:33, zhudi (J) wrote: > >> On 2021/4/16 11:27, zhudi (J) wrote: > >>>> dependencyOn 2021/4/15 11:35, zhudi wrote: > >>>>> From: Di Zhu <zhud...@huawei.com> > >>>>> > >>>>> We encountered a crash: in the packet receiving process, we got an > >>>>> illegal VLAN device address, but the VLAN device address saved in > >>>>> vmcore is correct. After checking the code, we found a possible > >>>>> data > >>>>> competition: > >>>>> CPU 0: CPU 1: > >>>>> (RCU read lock) (RTNL lock) > >>>>> vlan_do_receive() register_vlan_dev() > >>>>> vlan_find_dev() > >>>>> > >>>>> ->__vlan_group_get_device() ->vlan_group_prealloc_vid() > >>>>> > >>>>> In vlan_group_prealloc_vid(), We need to make sure that kzalloc is > >>>>> executed before assigning a value to vlan devices array, otherwise > >>>>> we > >>>> > >>>> As my understanding, there is a dependency between calling > >>>> kzalloc() and assigning the address(returned from kzalloc()) to > >>>> vg->vlan_devices_arrays, CPU and compiler can see the dependency, > >>>> why can't it handling the dependency before adding the smp_wmb()? > >>>> > >>>> See CONTROL DEPENDENCIES section in Documentation/memory- > >>>> barriers.txt: > >>>> > >>>> However, stores are not speculated. This means that ordering -is- > >> provided > >>>> for load-store control dependencies, as in the following example: > >>>> > >>>> q = READ_ONCE(a); > >>>> if (q) { > >>>> WRITE_ONCE(b, 1); > >>>> } > >>>> > >>> > >>> Maybe I didn't make it clear. This memory isolation is to ensure > >>> the order > >> of > >>> memset(object, 0, size) in kzalloc() operations and the subsequent > >>> array > >> assignment statements. > >>> > >>> kzalloc() > >>> ->memset(object, 0, size) > >>> > >>> smp_wmb() > >>> > >>> vg->vlan_devices_arrays[pidx][vidx] = array; > >>> > >>> Because __vlan_group_get_device() function depends on this order > >> > > > >> Thanks for clarify, it would be good to mention this in the commit > >> log too. > > > > OK, I'll change it. Thank you for your advice. > > > >> > >> Also, __vlan_group_get_device() is used in the data path, it would be > >> to avoid the barrier op too. Maybe using rcu to avoid the barrier if > >> the __vlan_group_get_device() is already protected by rcu_lock. > > > > Using the netperf command for testing on x86, there is no difference in > performance: > > This may make sense for x86 because x86 has a strong order memory model, > which has smp_rmb() as compiler barrier, as my understanding. > > How about the weak order memory model CPU? such as arm64, which has > smp_rmb() as 'dmb' instruction.
The test result on Arm is the same as that on X86, no matter whether it is patched or not, there is basically no difference in performance. The smp_rmb() semantically does not ensure the completion of any of the memory accesses for which it ensures relative order, So the performance impact should be minimal > Also the cpu usage may need to be looked at if data rate is at line speed. > > > > > # netperf -H 112.113.0.12 -l 20 -t TCP_STREAM MIGRATED TCP STREAM > TEST > > from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 112.113.0.12 () port 0 AF_INET > > Recv Send Send > > Socket Socket Message Elapsed > > Size Size Size Time Throughput > > bytes bytes bytes secs. 10^6bits/sec > > > > 131072 16384 16384 20.00 9386.03 > > > > After patch: > > > > # netperf -H 112.113.0.12 -l 20 -t TCP_STREAM MIGRATED TCP STREAM > > TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 112.113.0.12 () port 0 AF_INET > > Recv Send Send > > Socket Socket Message Elapsed > > Size Size Size Time Throughput > > bytes bytes bytes secs. 10^6bits/sec > > > > 131072 16384 16384 20.00 9386.41 > > > > The same is true for UDP stream test > > > >> > >>> > >>>> > >>>> > >>>>> may get a wrong address from the hardware cache on another cpu. > >>>>> > >>>>> So fix it by adding memory barrier instruction to ensure the order > >>>>> of memory operations. > >>>>> > >>>>> Signed-off-by: Di Zhu <zhud...@huawei.com> > >>>>> --- > >>>>> net/8021q/vlan.c | 2 ++ > >>>>> net/8021q/vlan.h | 3 +++ > >>>>> 2 files changed, 5 insertions(+) > >>>>> > >>>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index > >>>>> 8b644113715e..4f541e05cd3f 100644 > >>>>> --- a/net/8021q/vlan.c > >>>>> +++ b/net/8021q/vlan.c > >>>>> @@ -71,6 +71,8 @@ static int vlan_group_prealloc_vid(struct > >> vlan_group > >>>> *vg, > >>>>> if (array == NULL) > >>>>> return -ENOBUFS; > >>>>> > >>>>> + smp_wmb(); > >>>>> + > >>>>> vg->vlan_devices_arrays[pidx][vidx] = array; > >>>>> return 0; > >>>>> } > >>>>> diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h index > >>>>> 953405362795..7408fda084d3 100644 > >>>>> --- a/net/8021q/vlan.h > >>>>> +++ b/net/8021q/vlan.h > >>>>> @@ -57,6 +57,9 @@ static inline struct net_device > >>>>> *__vlan_group_get_device(struct vlan_group *vg, > >>>>> > >>>>> array = vg->vlan_devices_arrays[pidx] > >>>>> [vlan_id / > >>>> VLAN_GROUP_ARRAY_PART_LEN]; > >>>>> + > >>>>> + smp_rmb(); > >>>>> + > >>>>> return array ? array[vlan_id % VLAN_GROUP_ARRAY_PART_LEN] : > >>>> NULL; } > >>>>> > >>>>> > >>> > >