Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi Kewen, > > Thanks for your review on this patch! > > "Kewen.Lin" <li...@linux.ibm.com> writes: > >> Hi Jeff, >> >> Sorry for the late review. >> >> on 2022/9/15 16:30, Jiufu Guo wrote: >>> Hi, >>> >>> For a complicate 64bit constant, blow is one instruction-sequence to >>> build: >>> lis 9,0x800a >>> ori 9,9,0xabcd >>> sldi 9,9,32 >>> oris 9,9,0xc167 >>> ori 9,9,0xfa16 >>> >>> while we can also use below sequence to build: >>> lis 9,0xc167 >>> lis 10,0x800a >>> ori 9,9,0xfa16 >>> ori 10,10,0xabcd >>> rldimi 9,10,32,0 >>> This sequence is using 2 registers to build high and low part firstly, >>> and then merge them. >>> In parallel aspect, this sequence would be faster. (Ofcause, using 1 more >>> register with potential register pressure). >>> >>> Bootstrap and regtest pass on ppc64le. >>> Is this ok for trunk? >>> >>> >>> BR, >>> Jeff(Jiufu) >>> >>> >>> gcc/ChangeLog: >>> >>> * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit >>> constant build. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/powerpc/parall_5insn_const.c: New test. >>> >>> --- cut... > @@ -0,1 +1,27 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8 -save-temps" } */ maybe, I could use power7. Any comments? > +/* { dg-require-effective-target has_arch_ppc64 } */ > + > +/* { dg-final { scan-assembler-times {\mlis\M} 4 } } */ > +/* { dg-final { scan-assembler-times {\mori\M} 4 } } */ > +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */ > + > +void __attribute__ ((noinline)) foo (unsigned long long *a) > +{ > + /* 2 lis + 2 ori + 1 rldimi for each constant. */ > + *a++ = 0x800aabcdc167fa16ULL; > + *a++ = 0x7543a876867f616ULL; > +} > + > +long long A[] = {0x800aabcdc167fa16ULL, 0x7543a876867f616ULL}; > +int > +main () > +{ > + long long res[2]; > + > + foo (res); > + if (__builtin_memcmp (res, A, sizeof (res)) != 0) > + __builtin_abort (); > + > + return 0; > +}