On Fri, Jun 6, 2008 at 4:40 PM, H.J. Lu <[EMAIL PROTECTED]> wrote: > On Fri, Jun 6, 2008 at 7:31 AM, Richard Guenther > <[EMAIL PROTECTED]> wrote: >> On Fri, Jun 6, 2008 at 4:28 PM, H.J. Lu <[EMAIL PROTECTED]> wrote: >>> On Fri, Jun 06, 2008 at 06:50:26AM -0700, H.J. Lu wrote: >>>> On Fri, Jun 06, 2008 at 10:28:34AM +0200, Jan Hubicka wrote: >>>> > > >>>> > > ymm0 and xmm0 are the same register. xmm0 is the lower 128bit >>>> > > of xmm0. I am not sure if we need separate XMM registers from >>>> > > YMM registers. >>>> > >>>> > >>>> > Yes, I know that xmm0 is lower part of ymm0. I still think we ought to >>>> > be able to support varargs that do save ymm0 registers only when ymm >>>> > values are passed same way as we touch SSE only when SSE values are >>>> > passed via EAX hint. >>>> >>>> Which register do you propose for hint? The current psABI uses RAX >>>> for XMM registers. We can't change it to AL and AH for YMM without >>>> breaking backward compatibility. >>>> >>>> > This way we will be able to support e.g. printf that has YMM printing % >>>> > construct but don't need YMM enabled hardware when those are not used. >>>> > >>>> > This is why I think extending EAX to contain information about amount of >>>> > XMM values to save and in addition YMM values to save is sane. Then old >>>> > non-YMM aware varargs prologues will crash when YMM values are passed, >>>> > but all other combinations will work. >>>> >>>> I don't think it is necessary since -mavx will enable AVX code >>>> generation for all SSE codes. Unless the function only uses integer, >>>> it will crash on non-YMM aware hardware. That is if there is one >>>> SSE register is used, which is hinted in RAX, varargs prologue will >>>> use AVX instructions to save it. We don't need another hint for AVX >>>> instructions. >>>> >>>> > > >>>> > > > >>>> > > > I personally don't have much preferences over 1. or 2.. 1. seems >>>> > > > relatively easy to implement too, or is packaging two 128bit values >>>> > > > to >>>> > > > single 256bit difficult in va_arg expansion? >>>> > > > >>>> > > >>>> > > Access to 256bit register as lower and upper 128bits needs 2 >>>> > > instructions. For store >>>> > > >>>> > > vmovaps %xmm7, -143(%rax) >>>> > > vextractf128 $1, %ymm7, -15(%rax) >>>> > > >>>> > > For load >>>> > > >>>> > > vmovaps -143(%rax),%xmm7 >>>> > > vinsert128 $1, -15(%rax),%ymm7,%ymm7 >>>> > > >>>> > > If we go beyond 256bit, we need more instructions to access >>>> > > the full register. For 512bit, it will be split into lower 128bit, >>>> > > middle 128bit and upper 256bit. 1024bit will have 4 parts. >>>> > > >>>> > > For #2, only one instruction will be needed for 256bit and >>>> > > beyond. >>>> > >>>> > Yes, but we will still save half of stack space. Well, I don't have >>>> > much preferences here. If it seems saner to simply save whole thing >>>> > saving lower part twice, I am fine with that. >>>> >>>> I was told that it wasn't very easy to get decent performance with >>>> split access. I extended my proposal to include a 16bit bitmask to >>>> indicate which YMM regisetrs should be saved. If the bit is 0, >>>> we should only save the the lower 128bit in the original register >>>> save area. Otherwise, we should only save the same whole YMM register. >>>> >>> >>> My second thought. How useful is such a bitmask? Do we really >>> need it? Is that accepetable to save the lower 128bit twice? >> >> Why do we need to save the lower 128bit at all if a ymm reg is passed? >> Can't we assume "type-correctness"? > > Say a double is passed in YMM0/XMM0, we should save it in XMM0 area. > Do we also need to save the whole 256bit YMM0? If we save both XMM0 and > YMM0, we are free to use any location to load the saved register content. > Either one will be correct.
What is the benefit here? (What would the contents of the upper 128bit be - apart from "undefined") I suppose you can load into xmm0 and then "extend" to ymm0? Richard.