2016-04-07 0:49 GMT+03:00 David Guillen Fandos <da...@davidgf.net>: > > Thanks a lot Ilya! > > I managed to get it working. There were some bugs regarding register > allocation that ended up promoting the class to be BLKmode instead of > V4SFmode. I had to debug it a bit, which is tricky, but in the end I > found my way through it. > > Just to finish this. Do you think from your experience that is difficult > to implement vector instructions that have variable sizes?
Having implemented instruction in some mode you shouldn't have much trouble to extend it into other mode using mode iterators. There are a lot of examples in GCC. > This > particular VFU has 4, 3, 2 and 1 element operations with arbitrary > swizzling. This is, we can load a V3SF and perform a dot product > operation with another V3SF to get a V1SF for instance. Of course the > elements might overlap, so if a vreg is A B C D we can have a 4 element > vector ABCD or a pair of 3 element vregs ABC and BCD, the same logic > applies to have 3 registers of V2SF type and so forth. It is very > flexible. It also allows column and row arranging, so we can load 4 > vectors in a 4x4 matrix and multiply them with another matrix > transposing them on the fly. Unfortunately GCC doesn't expect vector to have not a power of two number of elements. Thus you can't write float var __attribute__ ((vector_size (12))); and expect it to get V3SF mode. Target instruction set doesn't affect a way vector code is represented in GIMPLE. It means complex instructions like matrix multiplication don't have expressions with corresponding semantics and can't be just generated out of a single GIMPLE statement. You still may get advantage of your ISA when expand vector code. E.g. vec_extract_[lo|hi] may be expanded into simple SUBREG in your case. Advanced vector instructions may be generated by RTL optimizers. E.g. combine may merge few vector instructions into a single one. > > I guess this is too difficult to expose to gcc, which is more used to > intel SIMD stuff. In the past I wrote most of the kernels in assembly > and wrap them around C functions, but if you use classes and inline > functions having gcc on your side helps a lot (register allocation and > therefore less load/stores to memory). There are instructions which are never generated by compiler and exist mostly to be used manually. AES instruction set is a good example of such instructions. Intrinsics (builtin functions) is a better alternative to assembler code to manually write vector code with such instructions. Using intrinsics you get register allocation and RTL optimizations working. Ilya > > Thanks a lot for your help! > > David > >