On Mon, 2020-11-02 at 10:40 +0000, Ali Alnubani wrote: > Hi Bruce, > > I was able to pin this down on drivers/net/mlx5/mlx5_rxtx.c. Removing -fPIC > from its ninja recipe in build.ninja resolves the issue (had to prevent > creating shared libs in this case). > What do you suggest I do? Can we have per-pmd customized compilation flags? > > Regards, > Ali
It's great to pin-point it down to that level - but would it be possible now to find out _why_ that one source file is affected by -fPIC in this way on GCC - and not on Clang? I think it would be much better to fix the issue itself, rather than applying the "sledgehammer" approach. Not using fpic has severe consequences for distributability as mentioned earlier. > > -----Original Message----- > > From: Ali Alnubani > > Sent: Thursday, October 22, 2020 5:17 PM > > To: Bruce Richardson <bruce.richard...@intel.com> > > Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon > > <tho...@monjalon.net>; Asaf Penso <as...@nvidia.com> > > Subject: RE: [dpdk-dev] performance degradation with fpic > > > > > -----Original Message----- > > > From: Bruce Richardson <bruce.richard...@intel.com> > > > Sent: Thursday, October 22, 2020 4:58 PM > > > To: Ali Alnubani <alia...@nvidia.com> > > > Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon > > <tho...@monjalon.net>; > > > Asaf Penso <as...@nvidia.com> > > > Subject: Re: [dpdk-dev] performance degradation with fpic > > > > > > On Thu, Oct 22, 2020 at 01:17:16PM +0000, Ali Alnubani wrote: > > > > Hi Bruce, > > > > Sorry for the delayed response. > > > > > > > > > -----Original Message----- > > > > > From: Bruce Richardson <bruce.richard...@intel.com> > > > > > Sent: Monday, October 19, 2020 4:02 PM > > > > > To: Ali Alnubani <alia...@nvidia.com> > > > > > Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon > > > <tho...@monjalon.net>; > > > > > Asaf Penso <as...@nvidia.com> > > > > > Subject: Re: [dpdk-dev] performance degradation with fpic > > > > > > > > > > On Mon, Oct 19, 2020 at 11:47:48AM +0000, Ali Alnubani wrote: > > > > > > Hi Bruce, > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Bruce Richardson <bruce.richard...@intel.com> > > > > > > > Sent: Friday, October 16, 2020 12:59 PM > > > > > > > To: Ali Alnubani <alia...@nvidia.com> > > > > > > > Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon > > > > > <tho...@monjalon.net>; > > > > > > > Asaf Penso <as...@nvidia.com> > > > > > > > Subject: Re: [dpdk-dev] performance degradation with fpic > > > > > > > > > > > > > > On Thu, Oct 15, 2020 at 06:08:04PM +0100, Bruce Richardson wrote: > > > > > > > > On Thu, Oct 15, 2020 at 04:00:44PM +0000, Ali Alnubani wrote: > > > > > > > > > Hi Bruce, > > > > > > > > > > > > > > > > > > > > > > > > > > > We have been seeing in some cases that the DPDK > > > > > > > > > forwarding > > > > > > > performance > > > > > > > > > is up to 9% lower when DPDK is built as static with > > > > > > > > > meson compared > > > > > to a > > > > > > > > > build with makefiles. > > > > > > > > > > > > > > > > > > > > > > > > > > > The same degradation can be reproduced with makefiles > > > > > > > > > on older > > > > > DPDK > > > > > > > > > releases when building with EXTAR_CFLAGS set to > > > > > > > > > “-fPIC”, it can also > > > > > be > > > > > > > > > resolved in meson when passing “pic: false” to meson’s > > > > > static_library > > > > > > > > > call (more tweaking needs to be done to prevent building > > shared > > > > > > > > > libraries because this change breaks them). > > > > > > > > > > > > > > > > > > > > > > > > > > > I can reproduce this drop with the following cases: > > > > > > > > > * Baremetal / NIC: ConnectX-4 Lx / OS: RHEL7.4 / CPU: > > > > > > > > > Intel(R) > > > > > > > > > Xeon(R) Gold 6154. Testpmd command: > > > > > > > > > > > > > > > > > > testpmd -c 0x7ffc0000 -n 4 -w d8:00.1 -w d8:00.0 > > > > > > > > > --socket- > > > > > > > mem=2048,2048 > > > > > > > > > -- --port-numa-config=0,1,1,1 --socket-num=1 --burst=64 > > > > > > > > > -- > > > txd=512 > > > > > > > > > --rxd=512 --mbcache=512 --rxq=2 --txq=2 --nb-cores=1 > > > > > > > > > --no-lsc- > > > > > > > interrupt > > > > > > > > > -i -a --rss-udp > > > > > > > > > * KVM guest with SR-IOV passthrough / OS: RHEL7.4 / NIC: > > > > > > > > > ConnectX-5 > > > > > > > / > > > > > > > > > Host’s CPU: Intel(R) Xeon(R) Gold 6154. Testpmd > > > > > > > > > command: > > > > > > > > > testpmd --master-lcore=0 -c 0x1ffff -n 4 -w > > > > > > > > > 00:05.0,mprq_en=1,mprq_log_stride_num=6 --socket- > > > > > mem=2048,0 -- > > > > > > > > > --port-numa-config=0,0 --socket-num=0 --burst=64 -- > > txd=1024 > > > > > > > > > --rxd=1024 --mbcache=512 --rxq=16 --txq=16 --nb-cores=8 > > > > > > > > > --port-topology=chained --forward-mode=macswap > > > > > > > > > --no-lsc- > > > > > > > interrupt > > > > > > > > > -i -a --rss-udp > > > > > > > > > * Baremetal / OS: Ubuntu 18.04 / NIC: ConnectX-5 / CPU: > > > Intel(R) > > > > > > > > > Xeon(R) CPU E5-2697A v4. Testpmd command: > > > > > > > > > testpmd -n 4 -w > > > > > > > > > 0000:82:00.0,rxqs_min_mprq=8,mprq_en=1 - > > > w > > > > > > > > > 0000:82:00.1,rxqs_min_mprq=8,mprq_en=1 -c 0xff80 > > > > > > > > > -- > > > > > > > > > -- > > > > > burst=64 > > > > > > > > > --mbcache=512 -i --nb-cores=8 --rxq=8 --txq=8 > > > > > > > > > --txd=1024 > > > > > > > > > --rxd=1024 --rss-udp --auto-start > > > > > > > > > > > > > > > > > > The packets being received and forwarded by testpmd are > > > > > > > > > of > > > > > IPv4/UDP > > > > > > > > > type and 64B size. > > > > > > > > > > > > > > > > > > Should we disable PIC in static builds? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Ali, > > > > > > > > > > > > > > > > thanks for reporting, though it's strange that you see such > > > > > > > > a big > > > impact. > > > > > > > > In my previous tests with i40e driver I never noticed a > > > > > > > > difference between make and meson builds, and I and some > > > > > > > > others here have been using meson builds for any performance > > > > > > > > work for over a year now. That being said let me reverify > > > > > > > > what I see > > > on my end. > > > > > > > > In terms of solutions, disabling the -fPIC flag globally > > > > > > > > implies that we can no longer build static and shared libs > > > > > > > > from the same sources, so we would need to revert to doing > > > > > > > > either a static or a shared library build but not both. If > > > > > > > > the issue is limited to only some drivers or some cases, we > > > > > > > > can perhaps add in a build option to have no-fpic-static > > > > > > > > builds, to be used in a cases where it is > > > > > problematic. > > > > > > > > However, at this point, I think we need a little more > > > > > > > > investigation. > > > > > > > > Is there any testing you can do to see if it's just in your > > > > > > > > driver, or in perhaps a mempool driver/lib that the issue > > > > > > > > appears, or if it's just a global slowdown? Do you see the > > > > > > > > impact with both clang > > > > > and gcc? > > > > > > > > I'll retest things a bit tomorrow on my end to see what I see. > > > > > > > > > > > > > > > Hi again, > > > > > > > > > > > > > > I've done a quick retest with the i40e driver on my system, > > > > > > > using the 20.08 version so as to have make vs meson direct > > > comparison. > > > > > > > [For reference command used was: "sudo </path/to/testpmd> -c > > > > > > > F00000 -w af:00.0 -w > > > > > > > b1:00.0 -w da:00.0 -- --rxq=2 --txq=2 --rxd=2048 --txd=512" > > > > > > > using 3x40G ports to a single core running @3GHz.] No major > > > > > > > performance differences were seen, but if anything the meson > > > > > > > build was very slightly faster, as reported to Jerin, maybe > > > > > > > 2%, though it's within the > > > > > margin of error. > > > > > > > > > > > > Thanks for taking the time to investigate this. > > > > > > > > > > > > Disabling PIC for net/mlx5 driver alone in drivers/meson.build > > > > > > resolves the > > > > > issue for me. > > > > > > I saw this issue with gcc (tested with 4.8.5, 9.3.0, and 7.5.0). > > > > > > But I see now > > > > > that disabling PIC with an old clang version (clang 3.4.2, > > > > > RHEL7.4) causes a drop in performance, not an improvement like with > > gcc. > > > > > That's interesting. > > > > > > > > > > When you just build with and without -fpic with newer clang, do > > > > > you see the same perf drop as with gcc? With the older clang, is > > > > > the shared lib build faster than the static one? > > > > > > > > With the older clang on RHEL7.4, the shared lib is about ~2% slower > > > compared to the static build. > > > > With clang 11 compiled from source on ubuntu 18.04, I'm getting good > > > performance with static meson build, same performance as with > > > makefiles with gcc, and ~6% better than the static meson gcc build. > > > Disabling PIC on clang 11 degrades performance by ~4%. > > > > With clang 6.0.0 however, disabling PIC causes a very small drop > > > > (~0.1%). > > > > > > > > This is on v20.08 with KVM ConnectX-5 SR-IOV passthrough. Command: > > > "dpdk-testpmd --master-lcore=0 -c 0x1ffff -n 4 -w 00:05.0 --socket- > > > mem=2048,0 -- --port-numa-config=0,0 --socket-num=0 --burst=64 -- > > > txd=1024 --rxd=1024 --mbcache=512 --rxq=8 --txq=8 --nb-cores=4 --port- > > > topology=chained --forward-mode=macswap --no-lsc-interrupt -i -a > > > --rss- udp". > > > > > > So, am I right in saying that it appears the clang builds are all fine > > > here, that performance is pretty much as expected in all cases with > > > the default setting of PIC enabled? Therefore it appears that the > > > issue is limited to gcc builds at this point? > > > > > Yes it appears that way. > > > > Regards, > > Ali