I would use fftw source from their site not rhels source rpm, unless you
need to deploy it on a large number of machines. (Even then I would pull
latest source and update the srpm)

You can just build fftw from source. Add almost all the configure options.
We do this on rhel 6 because their version doesn't support wisdom or take
advantage of any recent CPU releases. It should install in /usr/local by
default. Just add the library path to /etc/ld.so.conf and run ldconfig and
you'll be set.
On Mar 9, 2016 8:47 AM, "devin kelly" <dwwke...@gmail.com> wrote:

> Thanks for the help, I don't think I could have figured this out on my own.
>
> This is because I'm on RHEL7 (argh!).  My libfftw.so doesn't contain any
> references to AVX. For me there are a couple of options for fixing this:
>
> 1) Use Nathan's branch.
> 2) Rebuild fftw with AVX support
> 3) Rebuild GR and Volk without AVX.
>
> I tried 2) first and noticed this in the spec file that was in the source
> RPM I was trying to rebuild:
>
> %ifarch %{ix86} x86_64
> # Enable SSE2 support for x86 and x86_64
> # (no avx as it is claimed to drastically slower)
> for((i=0;i<2;i++)); do
>  prec_flags[i]+=" --enable-sse2"
> done
> %endif
>
> Is the spec file author right?  Now I'm a little confused about the
> approach I should take.  I'll probably just go with 1) in the mean time.
>
> Thanks again Nathan,
> Devin
>
> On Wed, Mar 9, 2016 at 1:06 AM, West, Nathan <n...@ostatemail.okstate.edu>
> wrote:
>
>> The a and c vectors come from gr:fft objects' internal buffers. These are
>> internally created with fftwf_malloc (lines 152/156 of gr-fft/lib/fft.cc).
>> fftwf_malloc is obviously not generating buffers with proper alignment so
>> you're seeing a 50% (per buffer) that this segfaults. I'll note that this
>> is also only an issue with fftwf buffers when fftwf isn't built with AVX
>> support (and therefore nothing in fftwf requires  a 32-byte aligned buffer).
>>
>> Andy Walls (thanks!) pointed out on IRC that we had a similar issue years
>> ago with a QT sink.
>>
>> I have a branch that should fix this (
>> https://github.com/n-west/gnuradio/tree/fft-avx-alignment). I also
>> suggest you look in to getting a version of fftwf built with AVX. I don't
>> know if there's a good way to tell, but if I run readelf -a on my
>> libfftw3.so I see some functions with avx in the name.
>>
>> Cheers,
>> nw
>>
>>
>> On Tue, Mar 8, 2016 at 1:31 PM, devin kelly <dwwke...@gmail.com> wrote:
>>
>>> OK, here's my C program:
>>>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <volk/volk.h>
>>> #include <stdint.h>
>>>
>>> int main() {
>>>
>>>     size_t alignment = volk_get_alignment();
>>>
>>>     uint8_t* ptr;
>>>
>>>     ptr = (uint8_t*)volk_malloc(1000 * sizeof(uint8_t), alignment);
>>>     printf("alignment = %lu, ptr = %x, *ptr = %u\n", alignment, ptr,
>>> *ptr);
>>>     volk_free((void*)ptr);
>>>     ptr = NULL;
>>>
>>>
>>>     return 0;
>>> }
>>>
>>>
>>> Compile:
>>>
>>> $ gcc volk_test.c -o volk_test -lvolk -L/local_disk/gr_3.7.9_debug/lib
>>>
>>> It's output:
>>>
>>> $ ./volk_test
>>> Using Volk machine: avx2_64_mmx_orc
>>> alignment = 32, ptr = 151b040, *ptr = 00
>>>
>>> Also, I've attached the output from the preprocessor, this command:
>>>
>>> $ /usr/bin/cc  -DHAVE_AVX_CVTPI32_PS -DHAVE_CPUID_H -DHAVE_DLFCN_H
>>> -DHAVE_FENV_H -DHAVE_POSIX_MEMALIGN -DHAVE_XGETBV -Wall -fvisibility=hidden
>>> -g -I/local_disk/gr_3.7.9_src/volk/build_debug/include
>>> -I/local_disk/gr_3.7.9_src/volk/include
>>> -I/local_disk/gr_3.7.9_src/volk/kernels
>>> -I/local_disk/gr_3.7.9_src/volk/build_debug/lib
>>> -I/local_disk/gr_3.7.9_src/volk/lib -I/usr/include/orc-0.4  -E  -fPIC -o
>>> volk_malloc_preprocessed   -c
>>> /local_disk/gr_3.7.9_src/volk/lib/volk_malloc.c
>>>
>>> I just found the compiler step from from doing 'VERBOSE=1 make' then
>>> changed the output and added -E.  I attached volk_malloc_preprocessed as
>>> well.
>>>
>>> It looks like this is my volk_malloc():
>>>
>>>
>>> void *volk_malloc(size_t size, size_t alignment)
>>> {
>>>   void *ptr;
>>>
>>>
>>>
>>>
>>>   if (alignment == 1)
>>>     return malloc(size);
>>>
>>>   int err = posix_memalign(&ptr, alignment, size);
>>>   if(err == 0) {
>>>     return ptr;
>>>   }
>>>   else {
>>>     fprintf(stderr,
>>>             "VOLK: Error allocating memory "
>>>             "(posix_memalign: error %d: %s)\n", err, strerror(err));
>>>     return ((void *)0);
>>>   }
>>> }
>>>
>>>
>>>
>>> Devin
>>>
>>>
>>>
>>> On Tue, Mar 8, 2016 at 11:37 AM, West, Nathan <
>>> n...@ostatemail.okstate.edu> wrote:
>>>
>>>>
>>>> On Tue, Mar 8, 2016 at 10:58 AM, devin kelly <dwwke...@gmail.com>
>>>> wrote:
>>>>
>>>>> Calling 'info variables' (or args or locals) the last few frames
>>>>> didn't give me any real info so I built a copy of GR/Volk with debug
>>>>> symbols.  I ran the FG again, this time from GDB, here's my back trace.  
>>>>> In
>>>>> this backtrace you can see the arguments passed in each call.  I have an
>>>>> i7-5600U CPU @ 2.60GHz, the volk_profile is appended at the bottom.
>>>>>
>>>>
>>>> Excellent. Thanks for going through that extra step. It really helps.
>>>>
>>>>
>>>>>
>>>>> Here's are the links for the relevant code:
>>>>>
>>>>>
>>>>> https://github.com/gnuradio/volk/blob/f0b722392950bf7ede7b32f5ff60019bce7a8592/kernels/volk/volk_32fc_x2_multiply_32fc.h#L232
>>>>>
>>>>> https://github.com/gnuradio/gnuradio/blob/master/gr-filter/lib/fft_filter.cc#L323
>>>>>
>>>>> https://github.com/gnuradio/gnuradio/blob/222e0003f9797a1b92d64855bd2b93f0d9099f93/gr-digital/lib/corr_est_cc_impl.cc#L214
>>>>>
>>>>> Could the problem be that nitems is 257 and num_points is 512?  Or
>>>>> should nitems really be 256 and not 257?
>>>>>
>>>>
>>>> I don't think so. I'm not familiar with the details of the fft_filter
>>>> implementations, but usually these things will take in some history if they
>>>> don't have enough points to operate on (in this case 512).
>>>>
>>>> The much more worrying thing is your vector addresses.
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Devin
>>>>>
>>>>> (gdb) bt
>>>>> #0  0x00007fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
>>>>> (__P=0x3b051b0)
>>>>>     at /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835
>>>>> #1  0x00007fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma
>>>>> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
>>>>>
>>>>
>>>> 0x3b1f770 % 32 = 16 (bad)
>>>> 0x3b051b0 % 32 = 16 (bad)
>>>> 0x3b240e0 % 32 = 0 (good)
>>>>
>>>> Unfortunately it looks like volk_get_alignment is returning the wrong
>>>> thing or there's a bug in volk_malloc. Can you tell us what
>>>> volk_get_alignment returns? The easiest thing is probably to write a simple
>>>> C program that prints out the result (hmm, I should add that to
>>>> volk-config-info). I'd also like to know which volk_malloc implementation
>>>> you're using. Unfortunately I don't think we have an easy way to discover
>>>> that (hmm, something else that should be added to volk-config-info). I
>>>> think the best way might be to look at volk_malloc.c intermediate files
>>>> after the preprocessor has done its work.
>>>>
>>>> If you want to move on while we figure this out then you can edit
>>>> ~/.volk/volk_config and replace the avx2_fma with sse3 on the line that has
>>>> this kernel name on it.
>>>>
>>>>
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242
>>>>> #2  0x00007fffdc945a75 in __volk_32fc_x2_multiply_32fc_a
>>>>> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
>>>>>     at /local_disk/gr_3.7.9_src/volk/build_debug/lib/volk.c:7010
>>>>> #3  0x00007fffd3f8e360 in
>>>>> gr::filter::kernel::fft_filter_ccc::filter(int, std::complex<float> 
>>>>> const*,
>>>>> std::complex<float>*) (this=0x3b02f40, nitems=nitems@entry=257,
>>>>> input=input@entry=0x7fffc9cc7000, output=output@entry=0x3b36460)
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gr-filter/lib/fft_filter.cc:323
>>>>> #4  0x00007fffd42910df in gr::digital::corr_est_cc_impl::work(int,
>>>>> std::vector<void const*, std::allocator<void const*> >&, 
>>>>> std::vector<void*,
>>>>> std::allocator<void*> >&) (this=0x3b01560, noutput_items=257,
>>>>> input_items=..., output_items=std::vector of length 1, capacity 1 = {...})
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gr-digital/lib/corr_est_cc_impl.cc:237
>>>>> #5  0x00007fffdd064907 in gr::sync_block::general_work(int,
>>>>> std::vector<int, std::allocator<int> >&, std::vector<void const*,
>>>>> std::allocator<void const*> >&, std::vector<void*, std::allocator<void*>
>>>>> >&) (this=0x3b015b8, noutput_items=<optimized out>, ninput_items=...,
>>>>> input_items=..., output_items=...) at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/sync_block.cc:66
>>>>> #6  0x00007fffdd02f70f in gr::block_executor::run_one_iteration()
>>>>> (this=this@entry=0x7fff83ffedb0)
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/block_executor.cc:438
>>>>> #7  0x00007fffdd06da8a in
>>>>> gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr<gr::block>, int)
>>>>> (this=0x7fff83ffedb0, block=..., max_noutput_items=<optimized out>) at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/tpb_thread_body.cc:122
>>>>> #8  0x00007fffdd062761 in
>>>>> boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>,
>>>>> void>::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0)
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/scheduler_tpb.cc:44
>>>>> #9  0x00007fffdd062761 in
>>>>> boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>,
>>>>> void>::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0)
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/include/gnuradio/thread/thread_body_wrapper.h:51
>>>>> #10 0x00007fffdd062761 in
>>>>> boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>,
>>>>> void>::invoke(boost::detail::function::function_buffer&)
>>>>> (function_obj_ptr=...) at
>>>>> /usr/include/boost/function/function_template.hpp:153
>>>>> #11 0x00007fffdd016cd0 in
>>>>> boost::detail::thread_data<boost::function0<void> >::run() 
>>>>> (this=<optimized
>>>>> out>)
>>>>>     at /usr/include/boost/function/function_template.hpp:767
>>>>> #12 0x00007fffdd016cd0 in
>>>>> boost::detail::thread_data<boost::function0<void> >::run() 
>>>>> (this=<optimized
>>>>> out>)
>>>>>     at /usr/include/boost/thread/detail/thread.hpp:117
>>>>> #13 0x00007fffdbe4f24a in thread_proxy () at
>>>>> /lib64/libboost_thread-mt.so.1.53.0
>>>>> #14 0x00007ffff7800dc5 in start_thread () at /lib64/libpthread.so.0
>>>>> #15 0x00007ffff6e2528d in clone () at /lib64/libc.so.6
>>>>>
>>>>> Here are the locals on the last few frames:
>>>>>
>>>>> (gdb) f 0
>>>>> #0  0x00007fffdcaccb57 in _mm256_load_ps (__P=0x3b051b0) at
>>>>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835
>>>>> 835       return *(__m256 *)__P;
>>>>> (gdb) info locals
>>>>> No locals.
>>>>> (gdb) f 1
>>>>> #1  volk_32fc_x2_multiply_32fc_a_avx2_fma (cVector=0x3b1f770,
>>>>> aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
>>>>>     at
>>>>> /local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242
>>>>> 242         const __m256 x = _mm256_load_ps((float*)a); // Load the ar
>>>>> + ai, br + bi as ar,ai,br,bi
>>>>> (gdb) info locals
>>>>> y = {-4.87433296e+17, 4.59163468e-41, -3.92813517e+17, 4.59163468e-41,
>>>>> 5.15677835e-43, 0, 5.26888223e-43, 0}
>>>>> tmp2x = {6.389921e-43, 0, -512.314453, 4.59163468e-41, 1.26116862e-44,
>>>>> 0, -4.87433296e+17, 4.59163468e-41}
>>>>> x = {-512.314453, 4.59163468e-41, 0, 0, 2.76102662, -3.64918089,
>>>>> -4.92134571, -1.06491208}
>>>>> yl = {4.14784345e-43, 0, 1.26116862e-44, 0, -4.87442367e+17,
>>>>> 4.59163468e-41, -4.87439343e+17, 4.59163468e-41}
>>>>> yh = {-1674752, 4.59163468e-41, 0, 0, -1.50397414e-36, 4.59163468e-41,
>>>>> -3.31452625e+17, 4.59163468e-41}
>>>>> tmp2 = {6.72623263e-44, 1.2751816e-43, 2.24207754e-44, 0,
>>>>> 7.17464814e-43, 0, -3.31440427e+17, 4.59163468e-41}
>>>>> z = {0.794147611, 0, 0.263988227, 0, -0.380019426, 0, -0.953325868, 0}
>>>>> number = 0
>>>>> quarterPoints = 128
>>>>> c = 0x3b1f770
>>>>> a = 0x3b051b0
>>>>> b = 0x3b240e0
>>>>> (gdb) f 2
>>>>> #2  0x00007fffdc945a75 in __volk_32fc_x2_multiply_32fc_a
>>>>> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
>>>>>     at /local_disk/gr_3.7.9_src/volk/build_debug/lib/volk.c:7010
>>>>> 7010        volk_32fc_x2_multiply_32fc_a(cVector, aVector, bVector,
>>>>> num_points);
>>>>> (gdb) info locals
>>>>> No locals.
>>>>> (gdb) f 3
>>>>> #3  0x00007fffd3f8e360 in gr::filter::kernel::fft_filter_ccc::filter
>>>>> (this=0x3b02f40, nitems=nitems@entry=257,
>>>>>     input=input@entry=0x7fffc9cc7000, output=output@entry=0x3b36460)
>>>>> at /local_disk/gr_3.7.9_src/gnuradio/gr-filter/lib/fft_filter.cc:323
>>>>> 323               volk_32fc_x2_multiply_32fc_a(c, a, b, d_fftsize);
>>>>> (gdb) info locals
>>>>> a = <optimized out>
>>>>> b = <optimized out>
>>>>> c = <optimized out>
>>>>> i = 0
>>>>> dec_ctr = 0
>>>>> j = <optimized out>
>>>>> ninput_items = 257
>>>>>
>>>>> My volk profile results:
>>>>>
>>>>> $  volk_profile -R 32fc_x2_multiply
>>>>> Using Volk machine: avx2_64_mmx_orc
>>>>> RUN_VOLK_TESTS: volk_32fc_x2_multiply_32fc(131071,1987)
>>>>> u_avx2_fma completed in 220ms
>>>>> u_avx completed in 220ms
>>>>> u_sse3 completed in 240ms
>>>>> generic completed in 2810ms
>>>>> a_avx2_fma completed in 200ms
>>>>> a_avx completed in 220ms
>>>>> a_sse3 completed in 230ms
>>>>> a_generic completed in 2810ms
>>>>> u_orc completed in 280ms
>>>>> Best aligned arch: a_avx2_fma
>>>>> Best unaligned arch: u_avx2_fma
>>>>> RUN_VOLK_TESTS: volk_32fc_x2_multiply_conjugate_32fc(131071,1987)
>>>>> u_avx completed in 230ms
>>>>> u_sse3 completed in 230ms
>>>>> generic completed in 2790ms
>>>>> a_avx completed in 220ms
>>>>> a_sse3 completed in 230ms
>>>>> a_generic completed in 2800ms
>>>>> Best aligned arch: a_avx
>>>>> Best unaligned arch: u_avx
>>>>> Writing "/home/devin/.volk/volk_config"...
>>>>>
>>>>>
>>>> Well I'm both jealous and happy that AVX2 is actually an improvement on
>>>> newer processors. Also matches the folklore that these new technologies are
>>>> usually not faster in the first silicon products that they come out in.
>>>>
>>>
>>>
>>> _______________________________________________
>>> Discuss-gnuradio mailing list
>>> Discuss-gnuradio@gnu.org
>>> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>>>
>>>
>>
>
> _______________________________________________
> Discuss-gnuradio mailing list
> Discuss-gnuradio@gnu.org
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>
>
_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to