> > 1. Extend the register save area to put upper 128bit at the end. > Pros: > Aligned access. > Save stack space if 256bit registers are used. > Cons > Split access. Require more split access beyond 256bit. > > 2. Extend the register save area to put full 265bit YMMs at the end. > The first DWORD after the register save area has the offset of > the extended array for YMM registers. The next DWORD has the > element size of the extended array. Unaligned access will be used. > Pros: > No split access. > Easily extendable beyond 256bit. > Limited unaligned access penalty if stack is aligned at 32byte. > Cons: > May require store both the lower 128bit and full 256bit register > content. We may avoid saving the lower 128bit if correct type > is required when accessing variable argument list, similar to int > vs. double. > Waste 272 byte on stack when 256bit registers are used. > Unaligned load and store. > > We should agree on one approach to ensure compatibility between > different compilers.
This is something that definitly should be hanlded by ABI update. We probably need to also somehow update the way to specify what to save to varargs prologue. Otherwise if you would have YMM aware printf running on non-AVX hardware, we would end up with invalid instructions. At the moment, eax is required to specify number of XMM registers, we probably can extend it to have number of XMM registers in AL and YMM in AH. I personally don't have much preferences over 1. or 2.. 1. seems relatively easy to implement too, or is packaging two 128bit values to single 256bit difficult in va_arg expansion? Honza > > Personally, I prefer #2 for its simplicity. Does anyone else have a > preference? > > Thanks. > > -- > H.J.