Tom,
See the attached file. I am running volk_profile now. If this is what
you need then that is great otherwise I will keep working on this with
whatever suggestions you have.
Cheers,
Fred
On 03/19/2012 08:10 AM, Tom Rondeau wrote:
On Sun, Mar 18, 2012 at 8:00 PM, Frederick Stevens
<[email protected] <mailto:[email protected]>> wrote:
Volk_profile ran to completion. I am using the git source tree
updated just before I did the run. I commented out line 38 of
volk_profile.cc as you suggested and ran volk_profile under gdb.
The output is in the attached text file. I have also attached the
generated volk_config from ~/.volk/volk_config.
Thanks. Strange that it's just that kernel, then. Can you put in some
debug lines that will print out the size of the buffers being used and
the 'number' variable in volk_32fc_x2_multiply_32fc_a when the crash
occurs. I just want to see if the loop is trying to go beyond the
bounds of the arrays.
I noted from running gnuradio-companion version 3.5.1, (which
works) that when I use a multiply block, this message from python
is generated:
./top_block.py
>>> gr_fir_fff: using 3DNow!
but volk_profile does not seem to recognize the 3DNow! processor
extensions (produces sse2 and sse3 messages on the Intel Atom 32
bit machine).
Yeah, that's fine. Without a 3DNow! kernel, Volk will just fall back
on the generic implementation. The thought being that the generic
version will work for everyone. So we need to figure out why that's
not true for your...
Hope this helps! Let me know if you want me to try anything
else. I'll let you know how things turn out on the other machine
as well.
Cheers,
Fred
Thanks.
Tom
On 03/18/2012 04:31 PM, Tom Rondeau wrote:
On Fri, Mar 16, 2012 at 6:11 PM, Frederick Stevens
<[email protected] <mailto:[email protected]>> wrote:
Well, after a few restarts, here is my output. I did a fresh
pull from git because I was getting some errors with missing
*.h files in gruel/src/swig or something like that. Hope
this helps!
RUN_VOLK_TESTS: volk_32fc_32f_multiply_32fc_a
Program received signal SIGSEGV, Segmentation fault.
0xb7edbb74 in volk_32fc_32f_multiply_32fc_a_generic
(cVector=0xb7448008,
aVector=0xb7768008, bVector=0xb78f8008, num_points=204600)
at
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
74 *cPtr++ = (*aPtr++) * (*bPtr++);
(gdb) bt
#0 0xb7edbb74 in volk_32fc_32f_multiply_32fc_a_generic
(cVector=0xb7448008,
aVector=0xb7768008, bVector=0xb78f8008, num_points=204600)
at
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
Alright, Fred, definitely something strange going on here. My
only guess is that for some reason on your
architecture/OS/whatever, something is being handled incorrectly
and the buffers a, b, and c are not getting generated correctly,
maybe something like it's not doubling the number of items for
the complex data type (before this function test, there are 16ic,
or complex shorts, being tested, but this is the first complex
float test).
It's hard to tell if it's something about it being an AMD chip,
32-bit, Slackware version, gcc version, etc. And I don't have an
AMD chip to test on, but I could load up a 32-bit Slackware VM at
least.
How much work are you willing to put into this to help us nail
this down?
If you can follow through the volk_profile test code, we can
start outputting more debug info. To start with, I'd suggest
going into volk/apps/volk_profile.cc and commenting out line 38,
rebuild the application, and run this new volk_profile to see if
it fails on any other kernels.
Thanks,
Tom
_______________________________________________
Discuss-gnuradio mailing list
[email protected] <mailto:[email protected]>
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
#ifndef INCLUDED_volk_32fc_x2_multiply_32fc_a_H
#define INCLUDED_volk_32fc_x2_multiply_32fc_a_H
#include <inttypes.h>
#include <stdio.h>
#include <volk/volk_complex.h>
#include <float.h>
#ifdef LV_HAVE_SSE3
#include <pmmintrin.h>
/*!
\brief Multiplies the two input complex vectors and stores their results in
the third vector
\param cVector The vector where the results will be stored
\param aVector One of the vectors to be multiplied
\param bVector One of the vectors to be multiplied
\param num_points The number of complex values in aVector and bVector to be
multiplied together and stored into cVector
*/
static inline void volk_32fc_x2_multiply_32fc_a_sse3(lv_32fc_t* cVector, const
lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points){
unsigned int number = 0;
const unsigned int halfPoints = num_points / 2;
__m128 x, y, yl, yh, z, tmp1, tmp2;
lv_32fc_t* c = cVector;
const lv_32fc_t* a = aVector;
const lv_32fc_t* b = bVector;
for(;number < halfPoints; number++){
x = _mm_load_ps((float*)a); // Load the ar + ai, br + bi as ar,ai,br,bi
y = _mm_load_ps((float*)b); // Load the cr + ci, dr + di as cr,ci,dr,di
yl = _mm_moveldup_ps(y); // Load yl with cr,cr,dr,dr
yh = _mm_movehdup_ps(y); // Load yh with ci,ci,di,di
tmp1 = _mm_mul_ps(x,yl); // tmp1 = ar*cr,ai*cr,br*dr,bi*dr
x = _mm_shuffle_ps(x,x,0xB1); // Re-arrange x to be ai,ar,bi,br
tmp2 = _mm_mul_ps(x,yh); // tmp2 = ai*ci,ar*ci,bi*di,br*di
z = _mm_addsub_ps(tmp1,tmp2); // ar*cr-ai*ci, ai*cr+ar*ci, br*dr-bi*di,
bi*dr+br*di
_mm_store_ps((float*)c,z); // Store the results back into the C container
a += 2;
b += 2;
c += 2;
}
if((num_points % 2) != 0) {
*c = (*a) * (*b);
}
}
#endif /* LV_HAVE_SSE */
#ifdef LV_HAVE_GENERIC
/*!
\brief Multiplies the two input complex vectors and stores their results in
the third vector
\param cVector The vector where the results will be stored
\param aVector One of the vectors to be multiplied
\param bVector One of the vectors to be multiplied
\param num_points The number of complex values in aVector and bVector to be
multiplied together and stored into cVector
*/
static inline void volk_32fc_x2_multiply_32fc_a_generic(lv_32fc_t* cVector,
const lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points){
lv_32fc_t* cPtr = cVector;
const lv_32fc_t* aPtr = aVector;
const lv_32fc_t* bPtr= bVector;
unsigned int number = 0;
for(number = 0; number < num_points; number++){
*cPtr++ = (*aPtr++) * (*bPtr++);
printf("%u %u %u %d \n",sizeof(aPtr),sizeof(bPtr),sizeof(cPtr),number);
}
}
#endif /* LV_HAVE_GENERIC */
#ifdef LV_HAVE_ORC
/*!
\brief Multiplies the two input complex vectors and stores their results in
the third vector
\param cVector The vector where the results will be stored
\param aVector One of the vectors to be multiplied
\param bVector One of the vectors to be multiplied
\param num_points The number of complex values in aVector and bVector to be
multiplied together and stored into cVector
*/
extern void volk_32fc_x2_multiply_32fc_a_orc_impl(lv_32fc_t* cVector, const
lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points);
static inline void volk_32fc_x2_multiply_32fc_a_orc(lv_32fc_t* cVector, const
lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points){
volk_32fc_x2_multiply_32fc_a_orc_impl(cVector, aVector, bVector,
num_points);
}
#endif /* LV_HAVE_ORC */
#endif /* INCLUDED_volk_32fc_x2_multiply_32fc_a_H */
_______________________________________________
Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio