On Fri, Jul 24, 2015 at 11:44 AM, Johannes Demel <[email protected]> wrote:
> Hey community, > > after last weeks success with channel construction, this week is > calmer. It involves a steep learning curve for SIMD. > So I was able to create my first VOLK kernels [3]. There are two new > kernels for 8bit packing and unpacking. In case someone wants to pack > 8 bytes with the LSB active into one byte, there's a new VOLK kernel > to do this for you. At first, I thought, this is as simple as doing a > load+movemask operation. Unfortunately, endianness stopped me from > doing so. Thus it involves shuffling and AND, COMPARE operations too. > Without Shuffling it should have worked with SSE2 but since shuffle is > involved SSSE3 is required. > I'm reading through all the docs and websites which target SIMD and > find new ways to do things all the time. So, I guess it is a long way > to go until I have some decent knowledge about SIMD instructions. > Though, I could achieve a 7x speedup for packing bits compared to the > generic implementation. > Also, I created a kernel for unpacking. I wasn't very successful here. > SSSE3 implementation is slower than the generic one for now. Maybe > someone can give me a hint on what is going wrong here. > I named those two new kernels 'volk_8u_pack8_8u' and > 'volk_8u_unpack8_8u'. I hope this explains there operation. > Suggestions on alternative names are welcome here. > I tried to integrate my VOLK kernels into VOLKS test framework, but > that is quite tough. It seems like it doesn't expect any rate changing > kernels. > > My aim for next week is to come up with a kernel for polar code > encoding. This will include interleaving a lot of bits which is the > actual issue to overcome. > > More info and current project progress can be found in [1], [2] and [3]. > > Cheers > Johannes > > [1] https://github.com/jdemel/gnuradio > [2] https://github.com/jdemel/socis-proposal > [3] https://github.com/jdemel/volk > > > Hi Johannes, This is pretty neat-- nice work! You'll probably need to use a puppet. The VOLK QA creates input and output buffers that are itemsize * num_points for every input and output. I think this is fine for the packer, but as you've discovered will not work for the unpacker. A puppet lets you wrap your actual kernel in a way that works nicely with the VOLK QA. In this case I suspect you want something like the following: volk_8u_unpack8puppet_8u_generic(uchar* out, uchar* in, num_points{ volk_8u_unpack8_8u_generic(out, in, num_points/8); } You'll get 8x as much buffer space and a bunch of inputs that you'll never operate on, which is OK. This obviously isn't critical for your GSoC project, but we'll want to do this at some point since this looks really useful. Nathan
_______________________________________________ Discuss-gnuradio mailing list [email protected] https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
