Hello all, I'm working on a project that involves selecting and filtering 10-15 narrow channels (10kHz bandwidth) from a relatively broadband input (1Mhz). I've been working on trying to implement this as performant as possible using GNURadio companion (see this email thread https://lists.gnu.org/archive/html/discuss-gnuradio/2019-10/msg00192.html). I tried of couple of things (using FIR bandpass filters, mixing each channel down to 0Hz then low pass filtering (both in one step and in stages), using FIR bandpass filters) and found that simply using FIR bandpass filters for each channel seemed to provide the best performance CPU-wise (20% CPU usage on my i7-920 desktop PC). However, the aim is to run this system on a Raspberry Pi 4 and unfortunately, the same flow runs at approximately 90% CPU and seems to cause lags when sending the data to the SDR (LimeSDR-USB).
I see the problem as potentially one of the following: - The flow is *still* not as efficient as it could be. - The RPi4 is just not powerful enough to run something like this and I need to use something more powerful (perhaps like the x86 Lattepanda boards?) - GNURadio is not compiled to use NEON optimisations. I've been exploring the last point recently and wanted to check whether NEON optimisations are indeed being utilised. So here's what I did: - I set up a Raspberry Pi 4 (4GB) using Raspbian Buster. - I installed GNURadio from the standard apt repository. This installs GNU Radio v3.7.13.4 and Volk 1.4 - I ran volk_profile to tune the library. - I then run the bpf-test flow (attached to this email). The CPU usage is 70%. Some info about the gnuradio and volk versions: gnuradio-config-info --cflags: /usr/bin/cc::: -g -O2 -fdebug-prefix-map=/build/gnuradio-FK7QfY/gnuradio-3.7.13.4=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -std=gnu99 -fvisibility=hidden -Wsign-compare -Wall -Wno-uninitialized /usr/bin/c++::: -g -O2 -fdebug-prefix-map=/build/gnuradio-FK7QfY/gnuradio-3.7.13.4=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fvisibility=hidden -Wsign-compare -Wall -Wno-uninitialized volk-config-info --cflags: /usr/bin/cc::: -g -O2 -fdebug-prefix-map=/build/volk-zBrTqH/volk-1.4=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall /usr/bin/c++::: -g -O2 -fdebug-prefix-map=/build/volk-zBrTqH/volk-1.4=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall generic_orc:::GNU::: -g -O2 -fdebug-prefix-map=/build/volk-zBrTqH/volk-1.4=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall volk-config-info --avail-machines generic_orc; So based on that, it appears that the gnuradio and volk packages on Raspbian are not built with NEON support. I then set about compiling gnuradio and volk from source to ensure that NEON support is included. I compiled *both* volk and gnuradio using the arm_cortex_a72_hardfp_native.cmake toolchain file that is included in the cmake/Toolchains folder in the volk source. I compiled volk separately and then when compiling gnuradio set ENABLE_INTERNAL_VOLK=OFF. In this case, I ended up compiling Volk v2.0 and gnuradio v3.9.0.0 (master from git). Here are the compiler flags: gnuradio-config-info --cflags /usr/bin/gcc:::-O3 -DNDEBUG -march=armv8-a -mtune=cortex-a72 -mfpu=neon-fp-armv8 -mfloat-abi=hard -fvisibility=hidden -Wsign-compare -Wall -Wno-uninitialized /usr/bin/g++:::-O3 -DNDEBUG -march=armv8-a -mtune=cortex-a72 -mfpu=neon-fp-armv8 -mfloat-abi=hard -fvisibility=hidden -Wsign-compare -Wall -Wno-uninitialized volk-config-info --cflags /usr/bin/gcc::: -march=armv8-a -mtune=cortex-a72 -mfpu=neon-fp-armv8 -mfloat-abi=hard -Wall /usr/bin/g++::: -march=armv8-a -mtune=cortex-a72 -mfpu=neon-fp-armv8 -mfloat-abi=hard -Wall generic_orc:::GNU:::-O3 -DNDEBUG -march=armv8-a -mtune=cortex-a72 -mfpu=neon-fp-armv8 -mfloat-abi=hard -Wall neon_orc:::GNU:::-O3 -DNDEBUG -march=armv8-a -mtune=cortex-a72 -mfpu=neon-fp-armv8 -mfloat-abi=hard -Wall -funsafe-math-optimizations neonv7_hardfp_orc:::GNU:::-O3 -DNDEBUG -march=armv8-a -mtune=cortex-a72 -mfpu=neon-fp-armv8 -mfloat-abi=hard -Wall -funsafe-math-optimizations -mfpu=neon -funsafe-math-optimizations -mfloat-abi=hard volk-config-info --avail-machines generic_orc;neon_orc;neonv7_hardfp_orc; volk-config-info --machine neonv7_hardfp_orc After running volk_profile to tune the library, I then ran the same flow, hoping that I'd get improved performance. Unfortunately, the performance was *exactly* the same, with CPU usage also at around 70%. I suspect one of the following: - The flow that I created is not using blocks written using Volk/optimised for NEON and as such enabling NEON support would make no difference (doubt it). - The gnuradio present in the Raspbian repositories *is actually* compiled using NEON support (despite the cflags showing otherwise) and I'm just simply running into the limitations of the CPU. - The gnuradio I compiled myself is actually *not using* NEON support (despite the cflags showing otherwise) and I need to figure out how to enable it. Any thoughts? Thanks, Amr
bpf-test.grc
Description: application/gnuradio-grc
