Pip Cet writes:
> On Sun, Aug 16, 2020 at 12:50 AM Segher Boessenkool > <seg...@kernel.crashing.org> wrote: >> On Sat, Aug 15, 2020 at 10:18:27AM +0000, Pip Cet wrote: >> > > > What I'm currently doing is this: >> > > > >> > > > (define_split >> > > > [(set (match_operand 0 "nonimmediate_operand") >> > > > (match_operand 1 "nox_general_operand"))] >> > > > "reload_completed && mov_clobbers_cc (insn, operands)" >> > > > [(parallel [(set (match_dup 0) (match_dup 1)) >> > > > (clobber (reg:CC REG_CC))])]) >> > > > >> > > > which, if I understand correctly, introduces the parallel-clobber >> > > > expression only when necessary, at the same time as post-reload >> > > > splitters introduce REG_CC references. Is that correct? >> > > >> > > Yes. And this will work correctly if somehow you ensure that REG_CC >> > > isn't live anywhere you introduce such a clobber. >> > >> > IIUC, the idea is that references to REG_CC, except for clobbers, are >> > only introduced in the post-reload split pass, so it cannot be live >> > before our define_split runs. Does that make sense? >> >> Yes, it does. It has some huge restrictions (using the reg in inline >> assembler can never work reliably, for example, so you'll have to >> disallow that). > > Is there any approach that doesn't suffer from that problem? My > understanding was that we need to allow reload to insert CC-clobbering > insns on this (and many other) architectures, and that there are so > many places where reload might choose to do so (including before and > after inline asm) that using the register prior to reload just isn't > possible. I'd be glad to be wrong, though :-) > > Is it true that reload chooses which constraint alternative is used > for each insn? Is that information available somehow to post-reload > splits? The problem is that the "X" constraint matches whatever's > there already, and doesn't replace it with the (scratch:CC) expression > that would work, so I can't rewrite those insns not to clobber the CC > reg. > > For example, here's what I currently have: > > (define_expand "mov<mode>" > [(parallel [(set (match_operand:MOVMODE 0 "nonimmediate_operand" "") > (match_operand:MOVMODE 1 "general_operand" "")) > (clobber (reg:CC REG_CC))])] > ...) > > (define_insn "mov<mode>_insn" > [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,r ,d ,Qm > ,r ,q,r,*r") > (match_operand:ALL1 1 "nox_general_operand" "r,Y00,n Ynn,r > Y00,Qm,r,q,i")) > (clobber (match_scratch:CC 2 "=X,c,X,c,c,X,X,c"))] > ...) > > That works, but it results in an incorrect CC clobber for, say, > register-to-register movqi. For that, I'd need something like > > (define_split > [(parallel [(set (match_operand:ALL1 0 "nonimmediate_operand") > (match_operand:ALL1 1 "nox_general_operand")) > (clobber (reg:CC REG_CC))])] > "reload_completed && REG_P (operands[0]) && REG_P (operands[1])" > [(parallel [(set (match_dup 0) > (match_dup 1)) > (clobber (scratch:CC))])]) > > and so on, for all four constraint alternatives that don't actually > clobber REG_CC (and also for a fifth which only rarely clobbers > REG_CC). That's duplicated code, of course. The (setattr "cc" ...) that is currently present for all patterns accounts for the constraint alternatives,so using get_attr_cc to split to a (clobber) of either cc_reg_rtx or a gen_rtx_SCRATCH (CCmode) appears to work. (define_insn_and_split "mov<mode>_insn" [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r ,d ,Qm ,r ,q,r,*r") (match_operand:ALL1 1 "nox_general_operand" "r Y00,n Ynn,r Y00,Qm,r,q,i"))] "register_operand (operands[0], <MODE>mode) || reg_or_0_operand (operands[1], <MODE>mode)" "#" "&& reload_completed" [(parallel [(set (match_dup 0) (match_dup 1)) (clobber (match_dup 2))])] { operands[2] = get_cc_reg_clobber_rtx (curr_insn); } [(set_attr "cc" "ldi,none,clobber,clobber,none,none,clobber")]) (define_insn "*mov<mode>_insn" [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r ,d ,Qm ,r ,q,r,*r") (match_operand:ALL1 1 "nox_general_operand" "r Y00,n Ynn,r Y00,Qm,r,q,i")) (clobber (match_scratch:CC 2 "=X ,X ,c ,c ,X,X,c"))] "(register_operand (operands[0], <MODE>mode) || reg_or_0_operand (operands[1], <MODE>mode)) && reload_completed" { return output_movqi (insn, operands, NULL); } [(set_attr "length" "1,1,5,5,1,1,4") (set_attr "adjust_len" "mov8")]) As you mentioned elsewhere, keeping the clobber with "X" helps keep a single pattern for both CC reg clobbering and non-clobbering insns. > > All this is at least somewhat academic: the code produced isn't > drastically better after my cc conversion, but it's not usually any > worse, and I'm still looking at assembler examples that are pessimized > a little to find out how to fix them... > >> And earlier passes (like combine) cannot optimise this, > > I'm not sure what "this" refers to: my understanding is that the idea > is to let combine see CC-free code, or code with CC clobbers only, > which it can then optimize, and only add "real" CC references after > reload, so many optimizations should just work. > >> But it is a pretty straightforward way to move from CC0 to the >> modern world! > > With the limitations of the reload pass being as they are, I don't > really see a dramatically better alternative. I suppose we could > explicitly save and restore CC flags around insns emitted when > reload_in_progress is true, to simulate non-CC-clobbering add/mov insn > patterns? That sounds like it would reduce code quality a lot, unless > great care were taken to make sure all of the save/restore CC flags > insns were optimized away in later passes. > > In any case, thanks for the answers so far! > > Pip