On Thu, Jun 5, 2008 at 4:31 PM, H.J. Lu <[EMAIL PROTECTED]> wrote: > Hi, > > x86-64 psABI defines > > typedef struct > { > unsigned int gp_offset; > unsigned int fp_offset; > void *overflow_arg_area; > void *reg_save_area; > } va_list[1]; > > for variable argument list. "va_list" is used to access variable argument > list: > > void > bar (const char *format, va_list ap) > { > if (va_arg (ap, int) != 0) > abort (); > } > > void > foo(char *fmt, ...) > { > va_list ap; > va_start (fmt, ap); > bar (fmt, ap); > va_end (ap); > } > > foo and bar may be compiled with different compilers. We have to keep > the current layout for va_list so that we can mix va_list codes compiled > with AVX and non-AVX compilers. We need to extend the variable argument > handling in the x86-64 psABI to support passing __m256/__m256d/__m256i > on the variable argument list. We propose 2 ways to extend the register > save area to add 256bit AVX registers support: > > 1. Extend the register save area to put upper 128bit at the end. > Pros: > Aligned access. > Save stack space if 256bit registers are used. > Cons > Split access. Require more split access beyond 256bit. > > 2. Extend the register save area to put full 265bit YMMs at the end. > The first DWORD after the register save area has the offset of > the extended array for YMM registers. The next DWORD has the > element size of the extended array. Unaligned access will be used. > Pros: > No split access. > Easily extendable beyond 256bit. > Limited unaligned access penalty if stack is aligned at 32byte. > Cons: > May require store both the lower 128bit and full 256bit register > content. We may avoid saving the lower 128bit if correct type > is required when accessing variable argument list, similar to int > vs. double. > Waste 272 byte on stack when 256bit registers are used. > Unaligned load and store. > > We should agree on one approach to ensure compatibility between > different compilers. > > Personally, I prefer #2 for its simplicity. Does anyone else have a > preference?
If you want to mix AVX and non-AVX code then you need a way to detect if AVX information was saved at runtime. What is it in those both cases? If you don't want to mix AVX and non-AVX code then basically you can declare the ABIs incompatible anyway? There is also a third option of passing AVX values by reference. For simplicity I would also prefer 2) - after all we don't need to fill in the XMM area / the AVX area if the value is unused. Richard.