On Mon, Jun 6, 2022 at 10:23 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> > > The major theme of this patch is to generalize many of i386.md's
> > > *di3_doubleword patterns to become *<dwi>_doubleword patterns, i.e.
> > > whenever there exists a "double word" optimization for DImode with
> > > -m32, there should be an equivalent TImode optimization on TARGET_64BIT.
> >
> > No, please do not mix two different themes in one patch.
> >
> > OTOH, the only TImode optimization that can be used with SSE registers is 
> > with
> > logic instructions and some constant shifts, but there is no TImode 
> > arithmetic. I
> > assume your end goal is to introduce STV for TImode on 64-bit targets, 
> > because
> > DImode patterns for x86_32 were introduced to avoid early decomposition by
> > middle end and to split instructions that STV didn't convert to vector 
> > instructions
> > after STV pass. So, let's start with basic V1TImode support before 
> > optimizations
> > are introduced.
>
> I'm not sure I understand.  What basic V1TImode support do you/we want next?
>
> This testcase and worked example with this patch shows its benefits without 
> STV
> nor using V1TI mode vectors.  As explained in the subject, and;cmp can be 
> turned
> into the cheaper not;cmp $0, for TImode (and DImode with -m32) in the same way
> as we currently do for SImode everywhere.  Having double word modes visible to
> combine, allows it to work its magic.  A recent patch ensured that double word
> compares were visible to combine, this optimization just required that double
> word logic (AND, IOR and XOR) are visible after combine, and in fact for -m32 
> DImode
> they already are, it's just that TImode is inconsistent, leading to missed 
> optimizations.
> Likewise, STV can't choose between implementations before there are 
> alternative
> Implementations to choose from.

Let me clarify my statement:

When double-mode patterns are NOT present, the middle-end splits
double-mode operations to word-mode at expansion time, taking into
account constant propagation on split operations, etc. The reason that
DImode patterns are present are due to STV on a 32-bit target, which
wants double-word operations to be unsplit until STV pass.
Unfortunately, this approach inhibits constant propagation, and the
missing functionality was implemented in a "manual way" when
operations are split to word-mode. Without targeting STV, optimization
opportunities are quite small (one of them is the conversion you
proposed above), so there is no pressing need to introduce TImode
operations.

So, by extending all DImode logic patterns to also handle TImode on
x86_64, we can also use them to implement TImode STV pass on x86_64.
This is something that would have a noticeable impact on the generated
code.

Uros.

> As always I'm happy to do things in the order you want (modulo my 36 hour spin
> cycle), in fact the reason this is being done now is that you recommended it 
> best
> to fix pr65105-5.c after the "double word comparison", which I fully agree 
> with,
> as it leads to a better solution that doesn’t require peephole2 (in your own 
> words,
> "why isn't this being done in combine?").
>
> I'm also certainly misunderstanding.  Which piece needs to be done next?
>
> Perhaps I should have used the term "the common theme" rather than
> "the major theme" that may have made it sound like there were unrelated
> or Independent bits in this patch.  But there are no V1TI changes in it.
>
> Thanks in advance, for any clarification.
>
> Cheers,
> Roger
> --
>
>

Reply via email to