On Fri, Jul 24, 2015 at 11:44 AM, Johannes Demel <[email protected]>
wrote:

> Hey community,
>
> after last weeks success with channel construction, this week is
> calmer. It involves a steep learning curve for SIMD.
> So I was able to create my first VOLK kernels [3]. There are two new
> kernels for 8bit packing and unpacking. In case someone wants to pack
> 8 bytes with the LSB active into one byte, there's a new VOLK kernel
> to do this for you. At first, I thought, this is as simple as doing a
> load+movemask operation. Unfortunately, endianness stopped me from
> doing so. Thus it involves shuffling and AND, COMPARE operations too.
> Without Shuffling it should have worked with SSE2 but since shuffle is
> involved SSSE3 is required.
> I'm reading through all the docs and websites which target SIMD and
> find new ways to do things all the time. So, I guess it is a long way
> to go until I have some decent knowledge about SIMD instructions.
> Though, I could achieve a 7x speedup for packing bits compared to the
> generic implementation.
> Also, I created a kernel for unpacking. I wasn't very successful here.
> SSSE3 implementation is slower than the generic one for now. Maybe
> someone can give me a hint on what is going wrong here.
> I named those two new kernels 'volk_8u_pack8_8u' and
> 'volk_8u_unpack8_8u'. I hope this explains there operation.
> Suggestions on alternative names are welcome here.
> I tried to integrate my VOLK kernels into VOLKS test framework, but
> that is quite tough. It seems like it doesn't expect any rate changing
> kernels.
>
> My aim for next week is to come up with a kernel for polar code
> encoding. This will include interleaving a lot of bits which is the
> actual issue to overcome.
>
> More info and current project progress can be found in [1], [2] and [3].
>
> Cheers
> Johannes
>
> [1] https://github.com/jdemel/gnuradio
> [2] https://github.com/jdemel/socis-proposal
> [3] https://github.com/jdemel/volk
>
>
>
Hi Johannes,

This is pretty neat-- nice work!

You'll probably need to use a puppet. The VOLK QA creates input and output
buffers that are itemsize * num_points for every input and output. I think
this is fine for the packer, but as you've discovered will not work for the
unpacker. A puppet lets you wrap your actual kernel in a way that works
nicely with the VOLK QA. In this case I suspect you want something like the
following:

volk_8u_unpack8puppet_8u_generic(uchar* out, uchar* in, num_points{
 volk_8u_unpack8_8u_generic(out, in, num_points/8);
}

You'll get 8x as much buffer space and a bunch of inputs that you'll never
operate on, which is OK. This obviously isn't critical for your GSoC
project, but we'll want to do this at some point since this looks really
useful.

Nathan
_______________________________________________
Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to