Re: [llvm-commits] Patch for X86 to use subregs

Evan Cheng Mon, 30 Jul 2007 11:06:44 -0700


On Jul 30, 2007, at 12:02 AM, Christopher Lamb wrote:

On Jul 29, 2007, at 10:20 PM, Evan Cheng wrote:
On Jul 29, 2007, at 9:37 PM, Christopher Lamb wrote:
On Jul 29, 2007, at 6:20 PM, Evan Cheng wrote:
Sent from my iPhone
On Jul 28, 2007, at 4:36 PM, Christopher Lamb<[EMAIL PROTECTED]> wrote:
On Jul 28, 2007, at 2:26 PM, Evan Cheng wrote:
On Jul 28, 2007, at 11:52 AM, Christopher Lamb<[EMAIL PROTECTED]> wrote:
On Jul 28, 2007, at 1:48 AM, Evan Cheng wrote:
Very cool! I need to read it more carefully.
But I see you are lowering zext to a single insert_subreg.Is that right? It won't zero out the top part, no?
It's only lowering (zext i32 to i64) to an insert_subreg onx86-64 where all writes to 32-bit registers implicitly zero-extend into the upper 32-bits.
I know. But thy mismatch semantically. A insert_subreg to thelower part should not change the upper half. I think this isonly legal for anyext.
On x86-64 the semantics of a 2 operand i32 insert_subreg isthat the input super-value is implicitly zero. So in this sensethe insert isn't changing the upper half, it's just that theupper half is being set to zero implicitly rather thanexplicitly. If you'll notice the insert_subreg is a two operand(implicit super value) not a three operand version. If theinsert were the three operand version, and the super value ascoming from an implicit def I'd agree with you, but it's not.
Ok, let's step back for a second. There are a couple of issuesthat should be addressed. Plz help me understand. :)
1: Semantics of insert_subreg should be the same across alltargets, right?
I'm not certain that this should be so. x86-64 clearly has atarget specific semantics of a 32-bit into 64-bit insert.
No, that won't do. insert_subreg and extract_subreg are bydefinition target independent. They must have the same semantics.You are forcing x86-64 32-bit zero-extending move to fitinsert_subreg when they are really not the same thing.
If target independence is a requirement, then I agree that usinginsert_subreg for x86-64 zero-ext isn't currently feasible.


Right.

2: two operant variant of insert_subreg should mean the superregis undef. If you insert a value into a low part, the rest of thesuperreg is still undef.
I think the meaning of insert_subreg instruction (both 2 and 3operand versions) must have semantics specific to the target. Forexample, on x86-64 there is no valid 3 operand insert_subreg fora 32-bit value into 64-bits, because the 32-bit result is alwaysgoing to be zero extended and overwrite the upper 32-bits.
It just means there is no way to implement a insert_subreg with asingle instruction under x86-64. But that is perfectly ok. Apartfrom anyext, x86-64 just isn't going to benefit from it. It's alsoimpossible to read or modify the higher 32-bits.
Currently the move that's generated isn't handled by coalescingbecause the source and destination belong to different registerclasses. The insert_subreg is meant to be a means to move valuesimplicitly between register classes that have a subregrelationship. So if insert_subreg semantics must be targetindependent, then I think you isel the zero-extending move to be:
(i64 (INSERT_SUBREG (i64 0), GR32:$src, 3))

But that's wrong. Remember the superreg argument is an read / mod /write operand. That is, the first operand is a use, the def is theLHS but we are forcing the allocator to target the same physicalregister.


v1 = some existing value
v1 = insert_subreg v1, GR32:$src, 3

But zext is zeroing out the top part. i.e. zext is equal to

mov v1, 0
v1 = insert_subreg v1, GR32:$src, 3

The thing is that the general coalescing will be able to determinethat the copy from undef is unneeded for (INSERT_SUBREG (i64undef), GR32:$src, 3), but it would take a target specific hook toknow that the constant zero is unneeded on x86-64. A targetspecific hook for this might be useful, but I think that this is inthe realm of future work now.

Sorry, I am not following. zext on x86-64, i.e. the 32-bit move,cannot be coalesced away. No need for target specific hook.

3: why is there a two operant variant in the first place? Whynot use undef for the superreg operant?
To note, the two operand variant is of the MachineInstr. The DAGform would be to represent the superregister as coming from anundef node, but this gets isel'd to the two operand MachineInstrof insert_subreg.
The reason is that undef is typically selected to an implicit defof a register. This causes an unnecessary move to be generatedlater on. This move can be optimized away later with moredifficulty during subreg lowering by checking whether the inputregister is defined by an implicit def pseudo instruction, butinstead I decided to perform the optimization during ISel on theDAG form during instruction selection.
With what you're suggesting
reg1024 = ...
reg1026 = insert_subreg undef, reg1024, 1
reg1027 = insert_subreg reg1026, reg1025, 1
use reg1027

would be isel'd to then subreg lowered to:

R6 = ...
implicit def R01 <= this implicit def is unecessary
That's a pseudo instruction, it doesn't cost anything.
R23 = R01 <= this copy is unnecessary
It can be coalesced to:
R23 = undef
R2 = R6
R45 = R23
R5 = R6
use R45
Using undef explicit is the right way to go. There is a goodreason it's there. Having the two operand version of insert_subregthat implicitly use an undef value doesn't fit into the overallllvm philosophy.
Right now the coalescing that you are describing is happeningduring isel. Are you simply saying that you'd rather have thecoalescing happen during subreg lowering? I can accept that, butwould you share your reasons?

There really isn't a very good argument for having the 2 differentversions of insert_subreg. undef use must be explicitly modeled. Ireally don't see what you mean by coalescing during isel. iseldoesn't have the concept of coalescing. Also don't forget everythingmust remain ssa until register allocation.

4: what's the benefit of isel a zext to insert_subreg and thenxform it to a 32-bit move?
The xform to a 32-bit move is only the conservative behavior. Thezext can be implicit if regalloc can coalesce subreg_inserts.
Why not just isel the zext to the move? It's not legal tocoalesce it away anyway.
Actually it is legal to coalesce it. On x86-64 any write to a 32-bit register zero extends the value to 64-bits. For theinsert_subreg under discussion the inserted value is a 32-bitresult, that has in-fact already be zero extended implicitly.
It's not legal to coalesce away the 32-bit zero extending move.

Suppose RAX contains some value with top 32-bits non-zero.
mov EAX, EAX (zero extend top bits)
use RAX (expecting top bits to be zero)

Coalesced away the move is a miscompilation.
Indeed, but what you have described is not a valid insert_subregeither. Insert_subreg would take EAX as its input operand and wouldonly be coalesced into an instruction that defines EAX explicitly(i.e. an instruction that defines RAX defines EAX implicitly, notexplicitly so no coalescing). I think that this coalescing rule isgenerally required for correctness when coalescing insert_subregunder any architecture.

What I've been saying all along. zero_extend on x86-64 isn't the sameas a insert_sub, don't try to model it that way.


Evan

Also the current behavior is to use a 32-bit mov instructionfor both zeroext and for anyext, I don't see how this is anydifferent.
On Jul 28, 2007, at 12:17 AM, Christopher Lamb<[EMAIL PROTECTED]> wrote:
This patch changes the X86 back end to use the new subregoperations for appropriate truncate and extend operations.This should allow regression testing of the subreg featuregoing forward, as it's now used in a public target.
The patch passed DejaGnu and all of SingleSource on my x86machine, but there are changes for x86-64 as well which Ihaven't been able to test. Output assembly for x86-64appears sane, but I'd appreciate someone giving the patch atry on their x86-64 system. Other 32-bit x86 testing isalso appreciated.
Thanks
--
Christopher Lamb

<x86_subregs.patch>


_______________________________________________
llvm-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
_______________________________________________
llvm-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
--
Christopher Lamb



_______________________________________________
llvm-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
_______________________________________________
llvm-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
--
Christopher Lamb



_______________________________________________
llvm-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
_______________________________________________
llvm-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
--
Christopher Lamb



_______________________________________________
llvm-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
_______________________________________________
llvm-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
--
Christopher Lamb



_______________________________________________
llvm-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

_______________________________________________
llvm-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

Re: [llvm-commits] Patch for X86 to use subregs

Reply via email to