The branch around the unrolled main loop has a nop in its delay
slot, but the instruction after that is harmless if the branch
is taken (it decrements the count which we just tested for equality
with 0, and is never used again), so we can delete the nop.
This saves one instruction any time the branch is not taken.
(FWIW, this all came about because I was wondering if there was
a better trick for multi-precision addition on MIPS than generating
two carries and merging them, and I figured that if anybody knew
of one, GMP would have it.)
The actual code change (commenting out a line and mechanically
adjusting indentation) is too trivial to claim copyright on, so
consider it to be in the public domain.
I also indented the delay instruction after the return jump,
just for consistency.
diff --git a/mpn/mips32/add_n.asm b/mpn/mips32/add_n.asm
index e7d4c48f4..8f51bb201 100644
--- a/mpn/mips32/add_n.asm
+++ b/mpn/mips32/add_n.asm
@@ -68,9 +68,9 @@ PROLOGUE(mpn_add_n)
addiu $4,$4,4
.L0: beq $7,$0,.Lend
- nop
+ C Next instruction acts as nop in delay slot
-.Loop: addiu $7,$7,-4
+.Loop: addiu $7,$7,-4
lw $12,4($5)
addu $11,$11,$2
@@ -120,5 +120,5 @@ PROLOGUE(mpn_add_n)
sltu $2,$11,$10
sw $11,0($4)
j $31
- or $2,$2,$8
+ or $2,$2,$8
EPILOGUE(mpn_add_n)
diff --git a/mpn/mips64/add_n.asm b/mpn/mips64/add_n.asm
index 6856407ef..36a69e3ca 100644
--- a/mpn/mips64/add_n.asm
+++ b/mpn/mips64/add_n.asm
@@ -78,9 +78,9 @@ PROLOGUE(mpn_add_n)
daddiu $4,$4,8
.L0: beq $7,$0,.Lend
- nop
+ C Next instruction acts as nop in delay slot
-.Loop: daddiu $7,$7,-4
+.Loop: daddiu $7,$7,-4
ld $12,8($5)
daddu $11,$11,$10
@@ -130,5 +130,5 @@ PROLOGUE(mpn_add_n)
sltu $2,$11,$10
sd $11,0($4)
j $31
- or $2,$2,$8
+ or $2,$2,$8
EPILOGUE()
_______________________________________________
gmp-bugs mailing list
[email protected]
https://gmplib.org/mailman/listinfo/gmp-bugs