On Wednesday, 18 May 2016 at 21:02:03 UTC, tsbockman wrote:
On Wednesday, 18 May 2016 at 19:53:10 UTC, Era Scarecrow wrote:
On Wednesday, 18 May 2016 at 19:36:59 UTC, tsbockman wrote:
I agree that intrinsics for this would be nice. I doubt that
any current D platform is actually computing the full 128 bit
result for every 64 bit multiply though - that would waste
both power and performance, for most programs.
Except the 128 result is _already_ there for 0 cost (at least
for x86 instructions that I'm aware).
Can you give me a source for this, or at least the name of the
relevant op code? (I'm new to x86 assembly.)
http://www.mathemainzel.info/files/x86asmref.html#mul
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
There's div, idiv, mul, and imul which follow this exact
pattern. Although the instruction mentioned in the following
pages is meant for 32bit or less, the pattern used is no
different.
(mathemainzel.info)
Usage
MUL src
Modifies flags
CF OF (AF,PF,SF,ZF undefined)
Unsigned multiply of the accumulator by the source. If "src" is a
byte value, then AL is used as the other multiplicand and the
result is placed in AX. If "src" is a word value, then AX is
multiplied by "src" and DX:AX receives the result. If "src" is a
double word value, then EAX is multiplied by "src" and EDX:EAX
receives the result. The 386+ uses an early out algorithm which
makes multiplying any size value in EAX as fast as in the 8 or 16
bit registers.
(intel.com)
Downloading the 64 intel manual on opcodes says the same thing,
only the registers become RDX:RAX with 64bit instructions.
Quadword RAX r/m64 RDX:RAX