Am Tue, 12 Apr 2016 13:22:12 -0700 schrieb Walter Bright <newshou...@digitalmars.com>:
> On 4/12/2016 9:53 AM, Marco Leise wrote: > > LDC implements InlineAsm_X86_Any (DMD style asm), so > > core.cpuid works. GDC is the only compiler that does not > > implement it. We agree that core.cpuid should provide this > > information, but what we have now - core.cpuid in a mix with > > GDC's lack of DMD style asm - does not work in practice for > > the years to come. > > Years? Anyone who needs core.cpuid could translate it to GDC's inline asm > style > in an hour or so. It could even be simply written separately in GAS and > linked > in. Since this has not been done, I can only conclude that core.cpuid has not > been an actual blocker. You mean it is ok, if I duplicated most of the asm in there and created a pull request ? > > Still, DMD does not inline asm and always adds a function > > prolog and epilog around asm blocks in an otherwise > > empty function (correct me if I'm wrong). > > Not if you use "naked". > > > "naked" means you > > have to duplicate code for the different calling conventions, > > in particular Win32. > > Why complain about it adding a prolog/epilog, and complain about it not > adding it? Yeah, I didn't make this clear. To reduce code repetition I'd like to avoid "naked" and have the compiler handle the calling conventions. Let's compare the earlier example in both GDC and DMD in a coding style that is agnostic wrt. the calling convention. First GDC: struct DblWord { ulong lo, hi; } DblWord bigMul(ulong x, ulong y) { DblWord tmp; asm { "mulq %[y]" : "=a" tmp.lo, "=d" tmp.hi : "a" x, [y] "rm" y; } return tmp; } This is turned into the following instruction sequence (AT&T): mov %rdi,%rax mul %rsi retq Note how elegantly GCC handles the calling convention for us. The prolog reduces to moving 'x' from RDI to RAX where I asked it to place it for the MUL to use as the implicit operand. After multiplying it by the explicit operand in RSI, the resulting two machine words would be in RAX:RDX as we know. I created a data structure to return those two and told GCC to tie tmp.lo to RAX and tmp.hi to RDX. Since the calling convention happens to return structs of 2 machine words in RAX:RDX, the whole assignment to 'tmp' and the return become no-ops. With inlining enabled only the 'mul' would remain. This is the ideal outcome. Now let's look at the DMD implementation - again letting the compiler figure out the calling convention: DblWord bigMul(ulong x, ulong y) { DblWord tmp; asm { mov RAX, x; mul y; mov tmp+DblWord.lo.offsetof, RAX; mov tmp+DblWord.hi.offsetof, RDX; } return tmp; } This generates the following: push %rbp mov %rsp,%rbp sub $0x20,%rsp mov %rdi,-0x10(%rbp) mov %rsi,-0x8(%rbp) lea -0x20(%rbp),%rax xor %ecx,%ecx mov %rcx,(%rax) mov %rcx,0x8(%rax) mov -0x8(%rbp),%rax mulq -0x10(%rbp) mov %rax,-0x20(%rbp) mov %rdx,-0x18(%rbp) mov -0x18(%rbp),%rdx mov -0x20(%rbp),%rax mov %rbp,%rsp pop %rbp retq In practice GDC will just replace the invokation with a single 'mul' instruction while DMD will emit a call to this 18 instructions long function. Now you keep telling me extended assembly is a step backwards. :) > It's a step backwards because I can't just say "MUL EAX". You could write this, you'd only have to tell the assembler that EAX and EDX will be overwritten, something that DMD already knows. > I have to tell GCC what register the result gets put in. And by doing this you allow it to figure out the shortest way to return the result in compliance with the calling convention. > This is, to my mind, ridiculous. I too find it annoying that I have to inform it about the scratch registers used in the asm, but the rest seems legit to me. At some point you will have to connect variables in the host language with registers in assembly. Doing this in a declarative manner instead of explicit assembly code, allows the backend to find the optimal code (literally) as demonstated above. > GCC's inline assembler apparently has no knowledge of what > the opcodes actually do. Agreed. It seems to treat the assembly text merely as a text template. It is the same with LLVM's extended assembler which borrows heavily from GCC's. This is probably due to the fact that the assembler is historically a standalone executable and as such the authority for interpreting the asm code is outside of the scope of the host language compiler. Under these circumstances we might have gone for the same implementation. -- Marco