Hi Segher, Thanks for your comments. On 13/7/2022 上午 1:26, Segher Boessenkool wrote: >> --- a/gcc/config/rs6000/rs6000.md >> +++ b/gcc/config/rs6000/rs6000.md >> @@ -7078,27 +7078,38 @@ (define_expand "subti3" >> }) >> >> ;; 128-bit logical operations expanders >> +;; Fail TImode in all 128-bit logical operations expanders and split it into >> +;; two DI registers. >> >> (define_expand "and<mode>3" >> [(set (match_operand:BOOL_128 0 "vlogical_operand") >> (and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand") >> (match_operand:BOOL_128 2 "vlogical_operand")))] >> "" >> - "") >> +{ >> + if (<MODE>mode == TImode) >> + FAIL; >> +}) > It is better to not FAIL it, but simply not have a pattern for the > TImode version at all. > > Does nothing depend on the :TI version to exist? > > What about the :PTI version? Getting rid of that as well will allow > some nice optimisations. > > Of course we *do* have instructions to do such TImode ops, on newer > CPUs, but in vector registers only. It isn't obvious what is faster. >
During expand, TI mode is split to two registers when it can't match any expands. So I failed TI mode in each expand and expect to be split at expand. TI mode is still in some insn_and_split patterns (e.g. "*and<mode>3_internal"). If later rtl passes generate TI mode logical operations, they still can be matched. Originally, the TI mode is split after reload pass by rs6000_split_logical. It's too late to catch some rtl optimizations. For the PTI, it can't be split to two registers during expand. PTI requires an even/odd register pair. So splitting it after reload can make sure it gets correct registers, I think. >From my understanding, it's sub-optimal to use vector logical operation instructions for TI mode if the destination is an integer operand. It needs three instructions (move to vector register, vector logical operation and move from vector register). When splitting TImode, it only needs two logical instructions on two separate registers. Thanks again Gui Haochen