Tom,

See the attached file. I am running volk_profile now. If this is what you need then that is great otherwise I will keep working on this with whatever suggestions you have.

Cheers,

Fred

On 03/19/2012 08:10 AM, Tom Rondeau wrote:
On Sun, Mar 18, 2012 at 8:00 PM, Frederick Stevens <[email protected] <mailto:[email protected]>> wrote:

    Volk_profile ran to completion.  I am using the git source tree
    updated just before I did the run.  I commented out line 38 of
volk_profile.cc as you suggested and ran volk_profile under gdb. The output is in the attached text file. I have also attached the
    generated volk_config from ~/.volk/volk_config.


Thanks. Strange that it's just that kernel, then. Can you put in some debug lines that will print out the size of the buffers being used and the 'number' variable in volk_32fc_x2_multiply_32fc_a when the crash occurs. I just want to see if the loop is trying to go beyond the bounds of the arrays.

    I noted from running gnuradio-companion version 3.5.1, (which
    works) that when I use a multiply block, this message from python
    is generated:

     ./top_block.py
    >>> gr_fir_fff: using 3DNow!

    but volk_profile does not seem to recognize the 3DNow! processor
    extensions (produces sse2 and sse3 messages on the Intel Atom 32
    bit machine).


Yeah, that's fine. Without a 3DNow! kernel, Volk will just fall back on the generic implementation. The thought being that the generic version will work for everyone. So we need to figure out why that's not true for your...

    Hope this helps!  Let me know if you want me to try anything
    else.  I'll let you know how things turn out on the other machine
    as well.


    Cheers,

    Fred


Thanks.

Tom


    On 03/18/2012 04:31 PM, Tom Rondeau wrote:
    On Fri, Mar 16, 2012 at 6:11 PM, Frederick Stevens
    <[email protected] <mailto:[email protected]>> wrote:

        Well, after a few restarts, here is my output.  I did a fresh
        pull from git because I was getting some errors with missing
        *.h files in gruel/src/swig or something like that.  Hope
        this helps!


        RUN_VOLK_TESTS: volk_32fc_32f_multiply_32fc_a

        Program received signal SIGSEGV, Segmentation fault.
        0xb7edbb74 in volk_32fc_32f_multiply_32fc_a_generic
        (cVector=0xb7448008,
            aVector=0xb7768008, bVector=0xb78f8008, num_points=204600)
            at
        
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
        74          *cPtr++ = (*aPtr++) * (*bPtr++);
        (gdb) bt
        #0  0xb7edbb74 in volk_32fc_32f_multiply_32fc_a_generic
        (cVector=0xb7448008,
            aVector=0xb7768008, bVector=0xb78f8008, num_points=204600)
            at
        
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74


    Alright, Fred, definitely something strange going on here. My
    only guess is that for some reason on your
    architecture/OS/whatever, something is being handled incorrectly
    and the buffers a, b, and c are not getting generated correctly,
    maybe something like it's not doubling the number of items for
    the complex data type (before this function test, there are 16ic,
    or complex shorts, being tested, but this is the first complex
    float test).

    It's hard to tell if it's something about it being an AMD chip,
    32-bit, Slackware version, gcc version, etc. And I don't have an
    AMD chip to test on, but I could load up a 32-bit Slackware VM at
    least.

    How much work are you willing to put into this to help us nail
    this down?

    If you can follow through the volk_profile test code, we can
    start outputting more debug info. To start with, I'd suggest
    going into volk/apps/volk_profile.cc and commenting out line 38,
    rebuild the application, and run this new volk_profile to see if
    it fails on any other kernels.

    Thanks,
    Tom



    _______________________________________________
    Discuss-gnuradio mailing list
    [email protected] <mailto:[email protected]>
    https://lists.gnu.org/mailman/listinfo/discuss-gnuradio



#ifndef INCLUDED_volk_32fc_x2_multiply_32fc_a_H
#define INCLUDED_volk_32fc_x2_multiply_32fc_a_H

#include <inttypes.h>
#include <stdio.h>
#include <volk/volk_complex.h>
#include <float.h>

#ifdef LV_HAVE_SSE3
#include <pmmintrin.h>
  /*!
    \brief Multiplies the two input complex vectors and stores their results in 
the third vector
    \param cVector The vector where the results will be stored
    \param aVector One of the vectors to be multiplied
    \param bVector One of the vectors to be multiplied
    \param num_points The number of complex values in aVector and bVector to be 
multiplied together and stored into cVector
  */
static inline void volk_32fc_x2_multiply_32fc_a_sse3(lv_32fc_t* cVector, const 
lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points){
  unsigned int number = 0;
    const unsigned int halfPoints = num_points / 2;

    __m128 x, y, yl, yh, z, tmp1, tmp2;
    lv_32fc_t* c = cVector;
    const lv_32fc_t* a = aVector;
    const lv_32fc_t* b = bVector;
    for(;number < halfPoints; number++){
      
      x = _mm_load_ps((float*)a); // Load the ar + ai, br + bi as ar,ai,br,bi
      y = _mm_load_ps((float*)b); // Load the cr + ci, dr + di as cr,ci,dr,di
      
      yl = _mm_moveldup_ps(y); // Load yl with cr,cr,dr,dr
      yh = _mm_movehdup_ps(y); // Load yh with ci,ci,di,di
      
      tmp1 = _mm_mul_ps(x,yl); // tmp1 = ar*cr,ai*cr,br*dr,bi*dr
      
      x = _mm_shuffle_ps(x,x,0xB1); // Re-arrange x to be ai,ar,bi,br
      
      tmp2 = _mm_mul_ps(x,yh); // tmp2 = ai*ci,ar*ci,bi*di,br*di
      
      z = _mm_addsub_ps(tmp1,tmp2); // ar*cr-ai*ci, ai*cr+ar*ci, br*dr-bi*di, 
bi*dr+br*di
    
      _mm_store_ps((float*)c,z); // Store the results back into the C container

      a += 2;
      b += 2;
      c += 2;
    }

    if((num_points % 2) != 0) {
      *c = (*a) * (*b);
    }
}
#endif /* LV_HAVE_SSE */

#ifdef LV_HAVE_GENERIC
  /*!
    \brief Multiplies the two input complex vectors and stores their results in 
the third vector
    \param cVector The vector where the results will be stored
    \param aVector One of the vectors to be multiplied
    \param bVector One of the vectors to be multiplied
    \param num_points The number of complex values in aVector and bVector to be 
multiplied together and stored into cVector
  */
static inline void volk_32fc_x2_multiply_32fc_a_generic(lv_32fc_t* cVector, 
const lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points){
    lv_32fc_t* cPtr = cVector;
    const lv_32fc_t* aPtr = aVector;
    const lv_32fc_t* bPtr=  bVector;
    unsigned int number = 0;

    for(number = 0; number < num_points; number++){
      *cPtr++ = (*aPtr++) * (*bPtr++);
      printf("%u %u %u %d \n",sizeof(aPtr),sizeof(bPtr),sizeof(cPtr),number);
    }
}
#endif /* LV_HAVE_GENERIC */

#ifdef LV_HAVE_ORC
  /*!
    \brief Multiplies the two input complex vectors and stores their results in 
the third vector
    \param cVector The vector where the results will be stored
    \param aVector One of the vectors to be multiplied
    \param bVector One of the vectors to be multiplied
    \param num_points The number of complex values in aVector and bVector to be 
multiplied together and stored into cVector
  */
extern void volk_32fc_x2_multiply_32fc_a_orc_impl(lv_32fc_t* cVector, const 
lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points);
static inline void volk_32fc_x2_multiply_32fc_a_orc(lv_32fc_t* cVector, const 
lv_32fc_t* aVector, const lv_32fc_t* bVector, unsigned int num_points){
    volk_32fc_x2_multiply_32fc_a_orc_impl(cVector, aVector, bVector, 
num_points);
}
#endif /* LV_HAVE_ORC */





#endif /* INCLUDED_volk_32fc_x2_multiply_32fc_a_H */
_______________________________________________
Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to