On Wednesday, 18 May 2016 at 21:02:03 UTC, tsbockman wrote:
On Wednesday, 18 May 2016 at 19:53:10 UTC, Era Scarecrow wrote:
On Wednesday, 18 May 2016 at 19:36:59 UTC, tsbockman wrote:
I agree that intrinsics for this would be nice. I doubt that any current D platform is actually computing the full 128 bit result for every 64 bit multiply though - that would waste both power and performance, for most programs.

Except the 128 result is _already_ there for 0 cost (at least for x86 instructions that I'm aware).

Can you give me a source for this, or at least the name of the relevant op code? (I'm new to x86 assembly.)

http://www.mathemainzel.info/files/x86asmref.html#mul

http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

There's div, idiv, mul, and imul which follow this exact pattern. Although the instruction mentioned in the following pages is meant for 32bit or less, the pattern used is no different.

(mathemainzel.info)
Usage
    MUL     src
Modifies flags
    CF OF (AF,PF,SF,ZF undefined)

Unsigned multiply of the accumulator by the source. If "src" is a byte value, then AL is used as the other multiplicand and the result is placed in AX. If "src" is a word value, then AX is multiplied by "src" and DX:AX receives the result. If "src" is a double word value, then EAX is multiplied by "src" and EDX:EAX receives the result. The 386+ uses an early out algorithm which makes multiplying any size value in EAX as fast as in the 8 or 16 bit registers.


(intel.com)
Downloading the 64 intel manual on opcodes says the same thing, only the registers become RDX:RAX with 64bit instructions.

Quadword RAX r/m64 RDX:RAX

Reply via email to