On Tue, 11 Jul 2023, Richard Biener wrote:

> > > If a function contains calls then GCC can't know which
> > > parts of the XMM regset is clobbered by that, it may be parts
> > > which don't even exist yet (say until avx2048 comes out), so we must
> > > restrict ourself to only save/restore the SSE2 parts and then of course
> > > can only claim to not clobber those parts.
> >
> > Hm, I guess this is kinda the reason a "weak" form is needed. But this
> > highlights the difference between the two: the "weak" form will actively
> > preserve some state (so it cannot preserve future extensions), while
> > the "strong" form may just passively not touch any state, preserving
> > any state it doesn't know about.
> >
> > > To that end I introduce actually two related attributes (for naming
> > > see below):
> > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> >
> > This is the weak/active form; I'd suggest "preserve_high_sse".
> 
> Isn't it the opposite?  "preserves_low_sse", unless you suggest
> the name applies to the caller which has to preserve high parts
> when calling nosseclobber.

This is the form where the function annnotated with this attribute
consumes 128 bytes on the stack to "blindly" save/restore xmm8-15
if it calls anything with a vanilla ABI.

(actually thinking about it more, I'd like to suggest shelving this part
and only implement the zero-cost variant, noanysseclobber)

> > > * noanysseclobber: claims (and ensures) that nothing of any of the
> > >   registers overlapping xmm8-15 is clobbered (not even future, as of
> > >   yet unknown, parts)
> >
> > This is the strong/passive form; I'd suggest "only_low_sse".
> 
> Likewise.

Sorry if I managed to sow confusion here. In my mind, this is the form where
only xmm0-xmm7 can be written in the function annotated with the attribute,
including its callees. I was thinking that writing to zmm16-31 would be
disallowed too. The initial example was memcpy, where eight vector registers
are sufficient for the job.

> As for mask registers I understand we'd have to split the 8 register
> set into two halves to make the same approach work, otherwise
> we'd have no registers left to allocate from.

I'd suggest to look how many mask registers OpenMP SIMD AVX-512 clones
can receive as implicit arguments, as one data point.

Alexander

Reply via email to