Re: [PATCH 0/5] Add LoongArch v1.1 instructions
On 2023/10/31 19:06, gaosong wrote: 在 2023/10/31 下午5:13, Jiajie Chen 写道: On 2023/10/31 17:11, gaosong wrote: 在 2023/10/30 下午7:54, Jiajie Chen 写道: On 2023/10/30 16:23, gaosong wrote: 在 2023/10/28 下午9:09, Jiajie Chen 写道: On 2023/10/26 14:54, gaosong wrote: 在 2023/10/26 上午9:38, Jiajie Chen 写道: On 2023/10/26 03:04, Richard Henderson wrote: On 10/25/23 10:13, Jiajie Chen wrote: On 2023/10/24 07:26, Richard Henderson wrote: See target/arm/tcg/translate-a64.c, gen_store_exclusive, TCGv_i128 block. See target/ppc/translate.c, gen_stqcx_. The situation here is slightly different: aarch64 and ppc64 have both 128-bit ll and sc, however LoongArch v1.1 only has 64-bit ll and 128-bit sc. Ah, that does complicate things. Possibly use the combination of ll.d and ld.d: ll.d lo, base, 0 ld.d hi, base, 4 # do some computation sc.q lo, hi, base # try again if sc failed Then a possible implementation of gen_ll() would be: align base to 128-bit boundary, read 128-bit from memory, save 64-bit part to rd and record whole 128-bit data in llval. Then, in gen_sc_q(), it uses a 128-bit cmpxchg. But what about the reversed instruction pattern: ll.d hi, base, 4; ld.d lo, base 0? It would be worth asking your hardware engineers about the bounds of legal behaviour. Ideally there would be some very explicit language, similar to I'm a community developer not affiliated with Loongson. Song Gao, could you provide some detail from Loongson Inc.? ll.d r1, base, 0 dbar 0x700 ==> see 2.2.8.1 ld.d r2, base, 8 ... sc.q r1, r2, base Thanks! I think we may need to detect the ll.d-dbar-ld.d sequence and translate the sequence into one tcg_gen_qemu_ld_i128 and split the result into two 64-bit parts. Can do this in QEMU? Oh, I'm not sure. I think we just need to implement sc.q. We don't need to care about 'll.d-dbar-ld.d'. It's just like 'll.q'. It needs the user to ensure that . ll.q' is 1) ll.d r1 base, 0 ==> set LLbit, load the low 64 bits into r1 2) dbar 0x700 3) ld.d r2 base, 8 ==> load the high 64 bits to r2 sc.q needs to 1) Use 64-bit cmpxchg. 2) Write 128 bits to memory. Consider the following code: ll.d r1, base, 0 dbar 0x700 ld.d r2, base, 8 addi.d r2, r2, 1 sc.q r1, r2, base We translate them into native code: ld.d r1, base, 0 mv LLbit, 1 mv LLaddr, base mv LLval, r1 dbar 0x700 ld.d r2, base, 8 addi.d r2, r2, 1 if (LLbit == 1 && LLaddr == base) { cmpxchg addr=base compare=LLval new=r1 128-bit write {r2, r1} to base if cmpxchg succeeded } set r1 if sc.q succeeded If the memory content of base+8 has changed between ld.d r2 and addi.d r2, the atomicity is not guaranteed, i.e. only the high part has changed, the low part hasn't. Sorry, my mistake. need use cmpxchg_i128. See target/arm/tcg/translate-a64.c gen_store_exclusive(). gen_scq(rd, rk, rj) { ... TCGv_i128 t16 = tcg_temp_new_i128(); TCGv_i128 c16 = tcg_temp_new_i128(); TCGv_i64 low = tcg_temp_new_i64(); TCGv_i64 high= tcg_temp_new_i64(); TCGv_i64 temp = tcg_temp_new_i64(); tcg_gen_concat_i64_i128(t16, cpu_gpr[rd], cpu_gpr[rk])); tcg_gen_qemu_ld(low, cpu_lladdr, ctx->mem_idx, MO_TEUQ); tcg_gen_addi_tl(temp, cpu_lladdr, 8); tcg_gen_mb(TCG_BAR_SC | TCG_MO_LD_LD); tcg_gen_qemu_ld(high, temp, ctx->mem_idx, MO_TEUQ); The problem is that, the high value read here might not equal to the previously read one in ll.d r2, base 8 instruction. I think dbar 0x7000 ensures that the 2 loads in 'll.q' are a 128bit atomic operation. The code does work in real LoongArch machine. However, we are emulating LoongArch in qemu, we have to make it atomic, yet it isn't now. Thanks. Song Gao tcg_gen_concat_i64_i128(c16, low, high); tcg_gen_atomic_cmpxchg_i128(t16, cpu_lladdr, c16, t16, ctx->mem_idx, MO_128); ... } I am not sure this is right. I think Richard can give you more suggestions. @Richard Thanks. Song Gao Thanks. Song Gao For this series, I think we need set the new config bits to the 'max cpu', and change linux-user/target_elf.h ''any' to 'max', so that we can use these new instructions on linux-user mode. I will work on it. Thanks Song Gao https://developer.arm.com/documentation/ddi0487/latest/ B2.9.5 Load-Exclusive and Store-Exclusive instruction usage restrictions But you could do the same thing, aligning and recording the entire 128-bit quantity, then extract the ll.d result based on address bit 6. This would complicate the implementation of sc.d as well, but would perhaps bring us "close enough" to the actual architecture. Note that our Arm store-exclusive implementation isn't quite in spec either. There is quite a large comment within translate-a64.c store_exclusive() about the ways things are not quite right. But it seems to be close enough for actual usage to succeed. r~
Re: [PATCH 0/5] Add LoongArch v1.1 instructions
On 2023/10/31 17:11, gaosong wrote: 在 2023/10/30 下午7:54, Jiajie Chen 写道: On 2023/10/30 16:23, gaosong wrote: 在 2023/10/28 下午9:09, Jiajie Chen 写道: On 2023/10/26 14:54, gaosong wrote: 在 2023/10/26 上午9:38, Jiajie Chen 写道: On 2023/10/26 03:04, Richard Henderson wrote: On 10/25/23 10:13, Jiajie Chen wrote: On 2023/10/24 07:26, Richard Henderson wrote: See target/arm/tcg/translate-a64.c, gen_store_exclusive, TCGv_i128 block. See target/ppc/translate.c, gen_stqcx_. The situation here is slightly different: aarch64 and ppc64 have both 128-bit ll and sc, however LoongArch v1.1 only has 64-bit ll and 128-bit sc. Ah, that does complicate things. Possibly use the combination of ll.d and ld.d: ll.d lo, base, 0 ld.d hi, base, 4 # do some computation sc.q lo, hi, base # try again if sc failed Then a possible implementation of gen_ll() would be: align base to 128-bit boundary, read 128-bit from memory, save 64-bit part to rd and record whole 128-bit data in llval. Then, in gen_sc_q(), it uses a 128-bit cmpxchg. But what about the reversed instruction pattern: ll.d hi, base, 4; ld.d lo, base 0? It would be worth asking your hardware engineers about the bounds of legal behaviour. Ideally there would be some very explicit language, similar to I'm a community developer not affiliated with Loongson. Song Gao, could you provide some detail from Loongson Inc.? ll.d r1, base, 0 dbar 0x700 ==> see 2.2.8.1 ld.d r2, base, 8 ... sc.q r1, r2, base Thanks! I think we may need to detect the ll.d-dbar-ld.d sequence and translate the sequence into one tcg_gen_qemu_ld_i128 and split the result into two 64-bit parts. Can do this in QEMU? Oh, I'm not sure. I think we just need to implement sc.q. We don't need to care about 'll.d-dbar-ld.d'. It's just like 'll.q'. It needs the user to ensure that . ll.q' is 1) ll.d r1 base, 0 ==> set LLbit, load the low 64 bits into r1 2) dbar 0x700 3) ld.d r2 base, 8 ==> load the high 64 bits to r2 sc.q needs to 1) Use 64-bit cmpxchg. 2) Write 128 bits to memory. Consider the following code: ll.d r1, base, 0 dbar 0x700 ld.d r2, base, 8 addi.d r2, r2, 1 sc.q r1, r2, base We translate them into native code: ld.d r1, base, 0 mv LLbit, 1 mv LLaddr, base mv LLval, r1 dbar 0x700 ld.d r2, base, 8 addi.d r2, r2, 1 if (LLbit == 1 && LLaddr == base) { cmpxchg addr=base compare=LLval new=r1 128-bit write {r2, r1} to base if cmpxchg succeeded } set r1 if sc.q succeeded If the memory content of base+8 has changed between ld.d r2 and addi.d r2, the atomicity is not guaranteed, i.e. only the high part has changed, the low part hasn't. Sorry, my mistake. need use cmpxchg_i128. See target/arm/tcg/translate-a64.c gen_store_exclusive(). gen_scq(rd, rk, rj) { ... TCGv_i128 t16 = tcg_temp_new_i128(); TCGv_i128 c16 = tcg_temp_new_i128(); TCGv_i64 low = tcg_temp_new_i64(); TCGv_i64 high= tcg_temp_new_i64(); TCGv_i64 temp = tcg_temp_new_i64(); tcg_gen_concat_i64_i128(t16, cpu_gpr[rd], cpu_gpr[rk])); tcg_gen_qemu_ld(low, cpu_lladdr, ctx->mem_idx, MO_TEUQ); tcg_gen_addi_tl(temp, cpu_lladdr, 8); tcg_gen_mb(TCG_BAR_SC | TCG_MO_LD_LD); tcg_gen_qemu_ld(high, temp, ctx->mem_idx, MO_TEUQ); The problem is that, the high value read here might not equal to the previously read one in ll.d r2, base 8 instruction. tcg_gen_concat_i64_i128(c16, low, high); tcg_gen_atomic_cmpxchg_i128(t16, cpu_lladdr, c16, t16, ctx->mem_idx, MO_128); ... } I am not sure this is right. I think Richard can give you more suggestions. @Richard Thanks. Song Gao Thanks. Song Gao For this series, I think we need set the new config bits to the 'max cpu', and change linux-user/target_elf.h ''any' to 'max', so that we can use these new instructions on linux-user mode. I will work on it. Thanks Song Gao https://developer.arm.com/documentation/ddi0487/latest/ B2.9.5 Load-Exclusive and Store-Exclusive instruction usage restrictions But you could do the same thing, aligning and recording the entire 128-bit quantity, then extract the ll.d result based on address bit 6. This would complicate the implementation of sc.d as well, but would perhaps bring us "close enough" to the actual architecture. Note that our Arm store-exclusive implementation isn't quite in spec either. There is quite a large comment within translate-a64.c store_exclusive() about the ways things are not quite right. But it seems to be close enough for actual usage to succeed. r~
Re: [PATCH 0/5] Add LoongArch v1.1 instructions
On 2023/10/30 16:23, gaosong wrote: 在 2023/10/28 下午9:09, Jiajie Chen 写道: On 2023/10/26 14:54, gaosong wrote: 在 2023/10/26 上午9:38, Jiajie Chen 写道: On 2023/10/26 03:04, Richard Henderson wrote: On 10/25/23 10:13, Jiajie Chen wrote: On 2023/10/24 07:26, Richard Henderson wrote: See target/arm/tcg/translate-a64.c, gen_store_exclusive, TCGv_i128 block. See target/ppc/translate.c, gen_stqcx_. The situation here is slightly different: aarch64 and ppc64 have both 128-bit ll and sc, however LoongArch v1.1 only has 64-bit ll and 128-bit sc. Ah, that does complicate things. Possibly use the combination of ll.d and ld.d: ll.d lo, base, 0 ld.d hi, base, 4 # do some computation sc.q lo, hi, base # try again if sc failed Then a possible implementation of gen_ll() would be: align base to 128-bit boundary, read 128-bit from memory, save 64-bit part to rd and record whole 128-bit data in llval. Then, in gen_sc_q(), it uses a 128-bit cmpxchg. But what about the reversed instruction pattern: ll.d hi, base, 4; ld.d lo, base 0? It would be worth asking your hardware engineers about the bounds of legal behaviour. Ideally there would be some very explicit language, similar to I'm a community developer not affiliated with Loongson. Song Gao, could you provide some detail from Loongson Inc.? ll.d r1, base, 0 dbar 0x700 ==> see 2.2.8.1 ld.d r2, base, 8 ... sc.q r1, r2, base Thanks! I think we may need to detect the ll.d-dbar-ld.d sequence and translate the sequence into one tcg_gen_qemu_ld_i128 and split the result into two 64-bit parts. Can do this in QEMU? Oh, I'm not sure. I think we just need to implement sc.q. We don't need to care about 'll.d-dbar-ld.d'. It's just like 'll.q'. It needs the user to ensure that . ll.q' is 1) ll.d r1 base, 0 ==> set LLbit, load the low 64 bits into r1 2) dbar 0x700 3) ld.d r2 base, 8 ==> load the high 64 bits to r2 sc.q needs to 1) Use 64-bit cmpxchg. 2) Write 128 bits to memory. Consider the following code: ll.d r1, base, 0 dbar 0x700 ld.d r2, base, 8 addi.d r2, r2, 1 sc.q r1, r2, base We translate them into native code: ld.d r1, base, 0 mv LLbit, 1 mv LLaddr, base mv LLval, r1 dbar 0x700 ld.d r2, base, 8 addi.d r2, r2, 1 if (LLbit == 1 && LLaddr == base) { cmpxchg addr=base compare=LLval new=r1 128-bit write {r2, r1} to base if cmpxchg succeeded } set r1 if sc.q succeeded If the memory content of base+8 has changed between ld.d r2 and addi.d r2, the atomicity is not guaranteed, i.e. only the high part has changed, the low part hasn't. Thanks. Song Gao For this series, I think we need set the new config bits to the 'max cpu', and change linux-user/target_elf.h ''any' to 'max', so that we can use these new instructions on linux-user mode. I will work on it. Thanks Song Gao https://developer.arm.com/documentation/ddi0487/latest/ B2.9.5 Load-Exclusive and Store-Exclusive instruction usage restrictions But you could do the same thing, aligning and recording the entire 128-bit quantity, then extract the ll.d result based on address bit 6. This would complicate the implementation of sc.d as well, but would perhaps bring us "close enough" to the actual architecture. Note that our Arm store-exclusive implementation isn't quite in spec either. There is quite a large comment within translate-a64.c store_exclusive() about the ways things are not quite right. But it seems to be close enough for actual usage to succeed. r~
Re: [PATCH 0/5] Add LoongArch v1.1 instructions
On 2023/10/26 14:54, gaosong wrote: 在 2023/10/26 上午9:38, Jiajie Chen 写道: On 2023/10/26 03:04, Richard Henderson wrote: On 10/25/23 10:13, Jiajie Chen wrote: On 2023/10/24 07:26, Richard Henderson wrote: See target/arm/tcg/translate-a64.c, gen_store_exclusive, TCGv_i128 block. See target/ppc/translate.c, gen_stqcx_. The situation here is slightly different: aarch64 and ppc64 have both 128-bit ll and sc, however LoongArch v1.1 only has 64-bit ll and 128-bit sc. Ah, that does complicate things. Possibly use the combination of ll.d and ld.d: ll.d lo, base, 0 ld.d hi, base, 4 # do some computation sc.q lo, hi, base # try again if sc failed Then a possible implementation of gen_ll() would be: align base to 128-bit boundary, read 128-bit from memory, save 64-bit part to rd and record whole 128-bit data in llval. Then, in gen_sc_q(), it uses a 128-bit cmpxchg. But what about the reversed instruction pattern: ll.d hi, base, 4; ld.d lo, base 0? It would be worth asking your hardware engineers about the bounds of legal behaviour. Ideally there would be some very explicit language, similar to I'm a community developer not affiliated with Loongson. Song Gao, could you provide some detail from Loongson Inc.? ll.d r1, base, 0 dbar 0x700 ==> see 2.2.8.1 ld.d r2, base, 8 ... sc.q r1, r2, base Thanks! I think we may need to detect the ll.d-dbar-ld.d sequence and translate the sequence into one tcg_gen_qemu_ld_i128 and split the result into two 64-bit parts. Can do this in QEMU? For this series, I think we need set the new config bits to the 'max cpu', and change linux-user/target_elf.h ''any' to 'max', so that we can use these new instructions on linux-user mode. I will work on it. Thanks Song Gao https://developer.arm.com/documentation/ddi0487/latest/ B2.9.5 Load-Exclusive and Store-Exclusive instruction usage restrictions But you could do the same thing, aligning and recording the entire 128-bit quantity, then extract the ll.d result based on address bit 6. This would complicate the implementation of sc.d as well, but would perhaps bring us "close enough" to the actual architecture. Note that our Arm store-exclusive implementation isn't quite in spec either. There is quite a large comment within translate-a64.c store_exclusive() about the ways things are not quite right. But it seems to be close enough for actual usage to succeed. r~
Re: [PATCH 0/5] Add LoongArch v1.1 instructions
On 2023/10/26 03:04, Richard Henderson wrote: On 10/25/23 10:13, Jiajie Chen wrote: On 2023/10/24 07:26, Richard Henderson wrote: See target/arm/tcg/translate-a64.c, gen_store_exclusive, TCGv_i128 block. See target/ppc/translate.c, gen_stqcx_. The situation here is slightly different: aarch64 and ppc64 have both 128-bit ll and sc, however LoongArch v1.1 only has 64-bit ll and 128-bit sc. Ah, that does complicate things. Possibly use the combination of ll.d and ld.d: ll.d lo, base, 0 ld.d hi, base, 4 # do some computation sc.q lo, hi, base # try again if sc failed Then a possible implementation of gen_ll() would be: align base to 128-bit boundary, read 128-bit from memory, save 64-bit part to rd and record whole 128-bit data in llval. Then, in gen_sc_q(), it uses a 128-bit cmpxchg. But what about the reversed instruction pattern: ll.d hi, base, 4; ld.d lo, base 0? It would be worth asking your hardware engineers about the bounds of legal behaviour. Ideally there would be some very explicit language, similar to I'm a community developer not affiliated with Loongson. Song Gao, could you provide some detail from Loongson Inc.? https://developer.arm.com/documentation/ddi0487/latest/ B2.9.5 Load-Exclusive and Store-Exclusive instruction usage restrictions But you could do the same thing, aligning and recording the entire 128-bit quantity, then extract the ll.d result based on address bit 6. This would complicate the implementation of sc.d as well, but would perhaps bring us "close enough" to the actual architecture. Note that our Arm store-exclusive implementation isn't quite in spec either. There is quite a large comment within translate-a64.c store_exclusive() about the ways things are not quite right. But it seems to be close enough for actual usage to succeed. r~
Re: [PATCH 0/5] Add LoongArch v1.1 instructions
On 2023/10/24 14:10, Jiajie Chen wrote: On 2023/10/24 07:26, Richard Henderson wrote: On 10/23/23 08:29, Jiajie Chen wrote: This patch series implements the new instructions except sc.q, because I do not know how to match a pair of ll.d to sc.q. There are a couple of examples within the tree. See target/arm/tcg/translate-a64.c, gen_store_exclusive, TCGv_i128 block. See target/ppc/translate.c, gen_stqcx_. The situation here is slightly different: aarch64 and ppc64 have both 128-bit ll and sc, however LoongArch v1.1 only has 64-bit ll and 128-bit sc. I guest the intended usage of sc.q is: ll.d lo, base, 0 ll.d hi, base, 4 # do some computation sc.q lo, hi, base # try again if sc failed Possibly use the combination of ll.d and ld.d: ll.d lo, base, 0 ld.d hi, base, 4 # do some computation sc.q lo, hi, base # try again if sc failed Then a possible implementation of gen_ll() would be: align base to 128-bit boundary, read 128-bit from memory, save 64-bit part to rd and record whole 128-bit data in llval. Then, in gen_sc_q(), it uses a 128-bit cmpxchg. But what about the reversed instruction pattern: ll.d hi, base, 4; ld.d lo, base 0? Since there are no existing code utilizing the new sc.q instruction, I don't know what should we consider here. r~
Re: [PATCH 0/5] Add LoongArch v1.1 instructions
On 2023/10/24 07:26, Richard Henderson wrote: On 10/23/23 08:29, Jiajie Chen wrote: This patch series implements the new instructions except sc.q, because I do not know how to match a pair of ll.d to sc.q. There are a couple of examples within the tree. See target/arm/tcg/translate-a64.c, gen_store_exclusive, TCGv_i128 block. See target/ppc/translate.c, gen_stqcx_. The situation here is slightly different: aarch64 and ppc64 have both 128-bit ll and sc, however LoongArch v1.1 only has 64-bit ll and 128-bit sc. I guest the intended usage of sc.q is: ll.d lo, base, 0 ll.d hi, base, 4 # do some computation sc.q lo, hi, base # try again if sc failed r~
Re: [PATCH 1/5] include/exec/memop.h: Add MO_TESB
On 2023/10/23 23:49, David Hildenbrand wrote: Why? On 23.10.23 17:29, Jiajie Chen wrote: Signed-off-by: Jiajie Chen --- include/exec/memop.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/exec/memop.h b/include/exec/memop.h index a86dc6743a..834327c62d 100644 --- a/include/exec/memop.h +++ b/include/exec/memop.h @@ -140,6 +140,7 @@ typedef enum MemOp { MO_TEUL = MO_TE | MO_UL, MO_TEUQ = MO_TE | MO_UQ, MO_TEUO = MO_TE | MO_UO, + MO_TESB = MO_TE | MO_SB, MO_TESW = MO_TE | MO_SW, MO_TESL = MO_TE | MO_SL, MO_TESQ = MO_TE | MO_SQ, I recall that the reason for not having this is that the target endianess doesn't matter for single bytes. Thanks, you are right, I was copying some code using MO_TESW only to find that MO_TESB is missing... I should simply use MO_SB then.
Re: [PATCH 3/5] target/loongarch: Add amcas[_db].{b/h/w/d}
On 2023/10/23 23:29, Jiajie Chen wrote: The new instructions are introduced in LoongArch v1.1: - amcas.b - amcas.h - amcas.w - amcas.d - amcas_db.b - amcas_db.h - amcas_db.w - amcas_db.d The new instructions are gated by CPUCFG2.LAMCAS. Signed-off-by: Jiajie Chen --- target/loongarch/cpu.h| 1 + target/loongarch/disas.c | 8 +++ .../loongarch/insn_trans/trans_atomic.c.inc | 24 +++ target/loongarch/insns.decode | 8 +++ target/loongarch/translate.h | 1 + 5 files changed, 42 insertions(+) diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h index 7166c07756..80a476c3f8 100644 --- a/target/loongarch/cpu.h +++ b/target/loongarch/cpu.h @@ -156,6 +156,7 @@ FIELD(CPUCFG2, LBT_MIPS, 20, 1) FIELD(CPUCFG2, LSPW, 21, 1) FIELD(CPUCFG2, LAM, 22, 1) FIELD(CPUCFG2, LAM_BH, 27, 1) +FIELD(CPUCFG2, LAMCAS, 28, 1) /* cpucfg[3] bits */ FIELD(CPUCFG3, CCDMA, 0, 1) diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c index d33aa8173a..4aa67749cf 100644 --- a/target/loongarch/disas.c +++ b/target/loongarch/disas.c @@ -575,6 +575,14 @@ INSN(fldx_s, frr) INSN(fldx_d, frr) INSN(fstx_s, frr) INSN(fstx_d, frr) +INSN(amcas_b, rrr) +INSN(amcas_h, rrr) +INSN(amcas_w, rrr) +INSN(amcas_d, rrr) +INSN(amcas_db_b, rrr) +INSN(amcas_db_h, rrr) +INSN(amcas_db_w, rrr) +INSN(amcas_db_d, rrr) INSN(amswap_b, rrr) INSN(amswap_h, rrr) INSN(amadd_b, rrr) diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc b/target/loongarch/insn_trans/trans_atomic.c.inc index cd28e217ad..bea567fdaf 100644 --- a/target/loongarch/insn_trans/trans_atomic.c.inc +++ b/target/loongarch/insn_trans/trans_atomic.c.inc @@ -45,6 +45,22 @@ static bool gen_sc(DisasContext *ctx, arg_rr_i *a, MemOp mop) return true; } +static bool gen_cas(DisasContext *ctx, arg_rrr *a, +void (*func)(TCGv, TCGv, TCGv, TCGv, TCGArg, MemOp), +MemOp mop) +{ +TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE); +TCGv addr = gpr_src(ctx, a->rj, EXT_NONE); +TCGv val = gpr_src(ctx, a->rk, EXT_NONE); + +addr = make_address_i(ctx, addr, 0); + I'm unsure if I can use the same TCGv for the first and the third argument here. If it violates with the assumption, a temporary register can be used. +func(dest, addr, dest, val, ctx->mem_idx, mop); +gen_set_gpr(a->rd, dest, EXT_NONE); + +return true; +} + static bool gen_am(DisasContext *ctx, arg_rrr *a, void (*func)(TCGv, TCGv, TCGv, TCGArg, MemOp), MemOp mop) @@ -73,6 +89,14 @@ TRANS(ll_w, ALL, gen_ll, MO_TESL) TRANS(sc_w, ALL, gen_sc, MO_TESL) TRANS(ll_d, 64, gen_ll, MO_TEUQ) TRANS(sc_d, 64, gen_sc, MO_TEUQ) +TRANS(amcas_b, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESB) +TRANS(amcas_h, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESW) +TRANS(amcas_w, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESL) +TRANS(amcas_d, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TEUQ) +TRANS(amcas_db_b, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESB) +TRANS(amcas_db_h, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESW) +TRANS(amcas_db_w, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESL) +TRANS(amcas_db_d, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TEUQ) TRANS(amswap_b, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESB) TRANS(amswap_h, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESW) TRANS(amadd_b, LAM_BH, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESB) diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode index 678ce42038..cf4123cd46 100644 --- a/target/loongarch/insns.decode +++ b/target/loongarch/insns.decode @@ -261,6 +261,14 @@ ll_w0010 .. . . @rr_i14s2 sc_w0010 0001 .. . . @rr_i14s2 ll_d0010 0010 .. . . @rr_i14s2 sc_d0010 0011 .. . . @rr_i14s2 +amcas_b 0011 1101 1 . . .@rrr +amcas_h 0011 1101 10001 . . .@rrr +amcas_w 0011 1101 10010 . . .@rrr +amcas_d 0011 1101 10011 . . .@rrr +amcas_db_b 0011 1101 10100 . . .@rrr +amcas_db_h 0011 1101 10101 . . .@rrr +amcas_db_w 0011 1101 10110 . . .@rrr +amcas_db_d 0011 1101 10111 . . .@rrr amswap_b0011 1101 11000 . . .@rrr amswap_h0011 1101 11001 . . .@rrr amadd_b 0011 1101 11010 . . .@rrr diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h index 0b230530e7..3affefdafc 100644 --- a/target/loongarch/translate.h +++ b/target/loongarch/trans
[PATCH 4/5] target/loongarch: Add estimated reciprocal instructions
Add the following new instructions in LoongArch v1.1: - frecipe.s - frecipe.d - frsqrte.s - frsqrte.d - vfrecipe.s - vfrecipe.d - vfrsqrte.s - vfrsqrte.d - xvfrecipe.s - xvfrecipe.d - xvfrsqrte.s - xvfrsqrte.d They are guarded by CPUCFG2.FRECIPE. Altought the instructions allow implementation to improve performance by reducing precision, we use the existing softfloat implementation. Signed-off-by: Jiajie Chen --- target/loongarch/cpu.h | 1 + target/loongarch/disas.c | 12 target/loongarch/insn_trans/trans_farith.c.inc | 4 target/loongarch/insn_trans/trans_vec.c.inc| 8 target/loongarch/insns.decode | 12 target/loongarch/translate.h | 6 ++ 6 files changed, 43 insertions(+) diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h index 80a476c3f8..8f938effa8 100644 --- a/target/loongarch/cpu.h +++ b/target/loongarch/cpu.h @@ -155,6 +155,7 @@ FIELD(CPUCFG2, LBT_ARM, 19, 1) FIELD(CPUCFG2, LBT_MIPS, 20, 1) FIELD(CPUCFG2, LSPW, 21, 1) FIELD(CPUCFG2, LAM, 22, 1) +FIELD(CPUCFG2, FRECIPE, 25, 1) FIELD(CPUCFG2, LAM_BH, 27, 1) FIELD(CPUCFG2, LAMCAS, 28, 1) diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c index 4aa67749cf..9eb49fb5e3 100644 --- a/target/loongarch/disas.c +++ b/target/loongarch/disas.c @@ -473,6 +473,10 @@ INSN(frecip_s, ff) INSN(frecip_d, ff) INSN(frsqrt_s, ff) INSN(frsqrt_d, ff) +INSN(frecipe_s,ff) +INSN(frecipe_d,ff) +INSN(frsqrte_s,ff) +INSN(frsqrte_d,ff) INSN(fmov_s, ff) INSN(fmov_d, ff) INSN(movgr2fr_w, fr) @@ -1424,6 +1428,10 @@ INSN_LSX(vfrecip_s,vv) INSN_LSX(vfrecip_d,vv) INSN_LSX(vfrsqrt_s,vv) INSN_LSX(vfrsqrt_d,vv) +INSN_LSX(vfrecipe_s, vv) +INSN_LSX(vfrecipe_d, vv) +INSN_LSX(vfrsqrte_s, vv) +INSN_LSX(vfrsqrte_d, vv) INSN_LSX(vfcvtl_s_h, vv) INSN_LSX(vfcvth_s_h, vv) @@ -2338,6 +2346,10 @@ INSN_LASX(xvfrecip_s,vv) INSN_LASX(xvfrecip_d,vv) INSN_LASX(xvfrsqrt_s,vv) INSN_LASX(xvfrsqrt_d,vv) +INSN_LASX(xvfrecipe_s, vv) +INSN_LASX(xvfrecipe_d, vv) +INSN_LASX(xvfrsqrte_s, vv) +INSN_LASX(xvfrsqrte_d, vv) INSN_LASX(xvfcvtl_s_h, vv) INSN_LASX(xvfcvth_s_h, vv) diff --git a/target/loongarch/insn_trans/trans_farith.c.inc b/target/loongarch/insn_trans/trans_farith.c.inc index f4a0dea727..356cdf99b7 100644 --- a/target/loongarch/insn_trans/trans_farith.c.inc +++ b/target/loongarch/insn_trans/trans_farith.c.inc @@ -191,6 +191,10 @@ TRANS(frecip_s, FP_SP, gen_ff, gen_helper_frecip_s) TRANS(frecip_d, FP_DP, gen_ff, gen_helper_frecip_d) TRANS(frsqrt_s, FP_SP, gen_ff, gen_helper_frsqrt_s) TRANS(frsqrt_d, FP_DP, gen_ff, gen_helper_frsqrt_d) +TRANS(frecipe_s, FRECIPE_FP_SP, gen_ff, gen_helper_frecip_s) +TRANS(frecipe_d, FRECIPE_FP_DP, gen_ff, gen_helper_frecip_d) +TRANS(frsqrte_s, FRECIPE_FP_SP, gen_ff, gen_helper_frsqrt_s) +TRANS(frsqrte_d, FRECIPE_FP_DP, gen_ff, gen_helper_frsqrt_d) TRANS(flogb_s, FP_SP, gen_ff, gen_helper_flogb_s) TRANS(flogb_d, FP_DP, gen_ff, gen_helper_flogb_d) TRANS(fclass_s, FP_SP, gen_ff, gen_helper_fclass_s) diff --git a/target/loongarch/insn_trans/trans_vec.c.inc b/target/loongarch/insn_trans/trans_vec.c.inc index 98f856bb29..1c93e19ac4 100644 --- a/target/loongarch/insn_trans/trans_vec.c.inc +++ b/target/loongarch/insn_trans/trans_vec.c.inc @@ -4409,12 +4409,20 @@ TRANS(vfrecip_s, LSX, gen_vv_ptr, gen_helper_vfrecip_s) TRANS(vfrecip_d, LSX, gen_vv_ptr, gen_helper_vfrecip_d) TRANS(vfrsqrt_s, LSX, gen_vv_ptr, gen_helper_vfrsqrt_s) TRANS(vfrsqrt_d, LSX, gen_vv_ptr, gen_helper_vfrsqrt_d) +TRANS(vfrecipe_s, FRECIPE_LSX, gen_vv_ptr, gen_helper_vfrecip_s) +TRANS(vfrecipe_d, FRECIPE_LSX, gen_vv_ptr, gen_helper_vfrecip_d) +TRANS(vfrsqrte_s, FRECIPE_LSX, gen_vv_ptr, gen_helper_vfrsqrt_s) +TRANS(vfrsqrte_d, FRECIPE_LSX, gen_vv_ptr, gen_helper_vfrsqrt_d) TRANS(xvfsqrt_s, LASX, gen_xx_ptr, gen_helper_vfsqrt_s) TRANS(xvfsqrt_d, LASX, gen_xx_ptr, gen_helper_vfsqrt_d) TRANS(xvfrecip_s, LASX, gen_xx_ptr, gen_helper_vfrecip_s) TRANS(xvfrecip_d, LASX, gen_xx_ptr, gen_helper_vfrecip_d) TRANS(xvfrsqrt_s, LASX, gen_xx_ptr, gen_helper_vfrsqrt_s) TRANS(xvfrsqrt_d, LASX, gen_xx_ptr, gen_helper_vfrsqrt_d) +TRANS(xvfrecipe_s, FRECIPE_LASX, gen_xx_ptr, gen_helper_vfrecip_s) +TRANS(xvfrecipe_d, FRECIPE_LASX, gen_xx_ptr, gen_helper_vfrecip_d) +TRANS(xvfrsqrte_s, FRECIPE_LASX, gen_xx_ptr, gen_helper_vfrsqrt_s) +TRANS(xvfrsqrte_d, FRECIPE_LASX, gen_xx_ptr, gen_helper_vfrsqrt_d) TRANS(vfcvtl_s_h, LSX, gen_vv_ptr, gen_helper_vfcvtl_s_h) TRANS(vfcvth_s_h, LSX, gen_vv_ptr, gen_helper_vfcvth_s_h) diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode index cf4123cd46..92078f0f9f 100644 --- a/target/loongarch/insns.decode +++ b/target/loongarch/insns.decode @@ -371,6 +371,10 @@ frecip_s 00010001 01000
[PATCH 3/5] target/loongarch: Add amcas[_db].{b/h/w/d}
The new instructions are introduced in LoongArch v1.1: - amcas.b - amcas.h - amcas.w - amcas.d - amcas_db.b - amcas_db.h - amcas_db.w - amcas_db.d The new instructions are gated by CPUCFG2.LAMCAS. Signed-off-by: Jiajie Chen --- target/loongarch/cpu.h| 1 + target/loongarch/disas.c | 8 +++ .../loongarch/insn_trans/trans_atomic.c.inc | 24 +++ target/loongarch/insns.decode | 8 +++ target/loongarch/translate.h | 1 + 5 files changed, 42 insertions(+) diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h index 7166c07756..80a476c3f8 100644 --- a/target/loongarch/cpu.h +++ b/target/loongarch/cpu.h @@ -156,6 +156,7 @@ FIELD(CPUCFG2, LBT_MIPS, 20, 1) FIELD(CPUCFG2, LSPW, 21, 1) FIELD(CPUCFG2, LAM, 22, 1) FIELD(CPUCFG2, LAM_BH, 27, 1) +FIELD(CPUCFG2, LAMCAS, 28, 1) /* cpucfg[3] bits */ FIELD(CPUCFG3, CCDMA, 0, 1) diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c index d33aa8173a..4aa67749cf 100644 --- a/target/loongarch/disas.c +++ b/target/loongarch/disas.c @@ -575,6 +575,14 @@ INSN(fldx_s, frr) INSN(fldx_d, frr) INSN(fstx_s, frr) INSN(fstx_d, frr) +INSN(amcas_b, rrr) +INSN(amcas_h, rrr) +INSN(amcas_w, rrr) +INSN(amcas_d, rrr) +INSN(amcas_db_b, rrr) +INSN(amcas_db_h, rrr) +INSN(amcas_db_w, rrr) +INSN(amcas_db_d, rrr) INSN(amswap_b, rrr) INSN(amswap_h, rrr) INSN(amadd_b, rrr) diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc b/target/loongarch/insn_trans/trans_atomic.c.inc index cd28e217ad..bea567fdaf 100644 --- a/target/loongarch/insn_trans/trans_atomic.c.inc +++ b/target/loongarch/insn_trans/trans_atomic.c.inc @@ -45,6 +45,22 @@ static bool gen_sc(DisasContext *ctx, arg_rr_i *a, MemOp mop) return true; } +static bool gen_cas(DisasContext *ctx, arg_rrr *a, +void (*func)(TCGv, TCGv, TCGv, TCGv, TCGArg, MemOp), +MemOp mop) +{ +TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE); +TCGv addr = gpr_src(ctx, a->rj, EXT_NONE); +TCGv val = gpr_src(ctx, a->rk, EXT_NONE); + +addr = make_address_i(ctx, addr, 0); + +func(dest, addr, dest, val, ctx->mem_idx, mop); +gen_set_gpr(a->rd, dest, EXT_NONE); + +return true; +} + static bool gen_am(DisasContext *ctx, arg_rrr *a, void (*func)(TCGv, TCGv, TCGv, TCGArg, MemOp), MemOp mop) @@ -73,6 +89,14 @@ TRANS(ll_w, ALL, gen_ll, MO_TESL) TRANS(sc_w, ALL, gen_sc, MO_TESL) TRANS(ll_d, 64, gen_ll, MO_TEUQ) TRANS(sc_d, 64, gen_sc, MO_TEUQ) +TRANS(amcas_b, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESB) +TRANS(amcas_h, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESW) +TRANS(amcas_w, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESL) +TRANS(amcas_d, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TEUQ) +TRANS(amcas_db_b, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESB) +TRANS(amcas_db_h, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESW) +TRANS(amcas_db_w, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESL) +TRANS(amcas_db_d, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TEUQ) TRANS(amswap_b, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESB) TRANS(amswap_h, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESW) TRANS(amadd_b, LAM_BH, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESB) diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode index 678ce42038..cf4123cd46 100644 --- a/target/loongarch/insns.decode +++ b/target/loongarch/insns.decode @@ -261,6 +261,14 @@ ll_w0010 .. . . @rr_i14s2 sc_w0010 0001 .. . . @rr_i14s2 ll_d0010 0010 .. . . @rr_i14s2 sc_d0010 0011 .. . . @rr_i14s2 +amcas_b 0011 1101 1 . . .@rrr +amcas_h 0011 1101 10001 . . .@rrr +amcas_w 0011 1101 10010 . . .@rrr +amcas_d 0011 1101 10011 . . .@rrr +amcas_db_b 0011 1101 10100 . . .@rrr +amcas_db_h 0011 1101 10101 . . .@rrr +amcas_db_w 0011 1101 10110 . . .@rrr +amcas_db_d 0011 1101 10111 . . .@rrr amswap_b0011 1101 11000 . . .@rrr amswap_h0011 1101 11001 . . .@rrr amadd_b 0011 1101 11010 . . .@rrr diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h index 0b230530e7..3affefdafc 100644 --- a/target/loongarch/translate.h +++ b/target/loongarch/translate.h @@ -23,6 +23,7 @@ #define avail_LSPW(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSPW)) #define avail_LAM(C)(FIELD_EX32((C)->cpucfg2, CPUCFG2, LAM)) #define avail_LAM_BH(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, LAM_BH))
[PATCH 0/5] Add LoongArch v1.1 instructions
Latest revision of LoongArch ISA is out at https://www.loongson.cn/uploads/images/2023102309132647981.%E9%BE%99%E8%8A%AF%E6%9E%B6%E6%9E%84%E5%8F%82%E8%80%83%E6%89%8B%E5%86%8C%E5%8D%B7%E4%B8%80_r1p10.pdf (Chinese only). The revision includes the following updates: - estimated fp reciporcal instructions: frecip -> frecipe, frsqrt -> frsqrte - 128-bit width store-conditional instruction: sc.q - ll.w/d with acquire semantic: llacq.w/d, sc.w/d with release semantic: screl.w/d - compare and swap instructions: amcas[_db].b/w/h/d - byte and word-wide amswap/add instructions: am{swap/add}[_db].{b/h} - new definition for dbar hints - clarify 32-bit division instruction hebavior - clarify load ordering when accessing the same address - introduce message signaled interrupt - introduce hardware page table walker The new revision is implemented in the to be released Loongson 3A6000 processor. This patch series implements the new instructions except sc.q, because I do not know how to match a pair of ll.d to sc.q. Jiajie Chen (5): include/exec/memop.h: Add MO_TESB target/loongarch: Add am{swap/add}[_db].{b/h} target/loongarch: Add amcas[_db].{b/h/w/d} target/loongarch: Add estimated reciprocal instructions target/loongarch: Add llacq/screl instructions include/exec/memop.h | 1 + target/loongarch/cpu.h| 4 ++ target/loongarch/disas.c | 32 .../loongarch/insn_trans/trans_atomic.c.inc | 52 +++ .../loongarch/insn_trans/trans_farith.c.inc | 4 ++ target/loongarch/insn_trans/trans_vec.c.inc | 8 +++ target/loongarch/insns.decode | 32 target/loongarch/translate.h | 27 +++--- 8 files changed, 152 insertions(+), 8 deletions(-) -- 2.42.0
[PATCH 1/5] include/exec/memop.h: Add MO_TESB
Signed-off-by: Jiajie Chen --- include/exec/memop.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/exec/memop.h b/include/exec/memop.h index a86dc6743a..834327c62d 100644 --- a/include/exec/memop.h +++ b/include/exec/memop.h @@ -140,6 +140,7 @@ typedef enum MemOp { MO_TEUL = MO_TE | MO_UL, MO_TEUQ = MO_TE | MO_UQ, MO_TEUO = MO_TE | MO_UO, +MO_TESB = MO_TE | MO_SB, MO_TESW = MO_TE | MO_SW, MO_TESL = MO_TE | MO_SL, MO_TESQ = MO_TE | MO_SQ, -- 2.42.0
[PATCH 2/5] target/loongarch: Add am{swap/add}[_db].{b/h}
The new instructions are introduced in LoongArch v1.1: - amswap.b - amswap.h - amadd.b - amadd.h - amswap_db.b - amswap_db.h - amadd_db.b - amadd_db.h The instructions are gated by CPUCFG2.LAM_BH. Signed-off-by: Jiajie Chen --- target/loongarch/cpu.h | 1 + target/loongarch/disas.c | 8 target/loongarch/insn_trans/trans_atomic.c.inc | 8 target/loongarch/insns.decode | 8 target/loongarch/translate.h | 17 + 5 files changed, 34 insertions(+), 8 deletions(-) diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h index 8b54cf109c..7166c07756 100644 --- a/target/loongarch/cpu.h +++ b/target/loongarch/cpu.h @@ -155,6 +155,7 @@ FIELD(CPUCFG2, LBT_ARM, 19, 1) FIELD(CPUCFG2, LBT_MIPS, 20, 1) FIELD(CPUCFG2, LSPW, 21, 1) FIELD(CPUCFG2, LAM, 22, 1) +FIELD(CPUCFG2, LAM_BH, 27, 1) /* cpucfg[3] bits */ FIELD(CPUCFG3, CCDMA, 0, 1) diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c index 2040f3e44d..d33aa8173a 100644 --- a/target/loongarch/disas.c +++ b/target/loongarch/disas.c @@ -575,6 +575,14 @@ INSN(fldx_s, frr) INSN(fldx_d, frr) INSN(fstx_s, frr) INSN(fstx_d, frr) +INSN(amswap_b, rrr) +INSN(amswap_h, rrr) +INSN(amadd_b, rrr) +INSN(amadd_h, rrr) +INSN(amswap_db_b, rrr) +INSN(amswap_db_h, rrr) +INSN(amadd_db_b, rrr) +INSN(amadd_db_h, rrr) INSN(amswap_w, rrr) INSN(amswap_d, rrr) INSN(amadd_w, rrr) diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc b/target/loongarch/insn_trans/trans_atomic.c.inc index 80c2e286fd..cd28e217ad 100644 --- a/target/loongarch/insn_trans/trans_atomic.c.inc +++ b/target/loongarch/insn_trans/trans_atomic.c.inc @@ -73,6 +73,14 @@ TRANS(ll_w, ALL, gen_ll, MO_TESL) TRANS(sc_w, ALL, gen_sc, MO_TESL) TRANS(ll_d, 64, gen_ll, MO_TEUQ) TRANS(sc_d, 64, gen_sc, MO_TEUQ) +TRANS(amswap_b, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESB) +TRANS(amswap_h, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESW) +TRANS(amadd_b, LAM_BH, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESB) +TRANS(amadd_h, LAM_BH, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESW) +TRANS(amswap_db_b, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESB) +TRANS(amswap_db_h, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESW) +TRANS(amadd_db_b, LAM_BH, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESB) +TRANS(amadd_db_h, LAM_BH, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESW) TRANS(amswap_w, LAM, gen_am, tcg_gen_atomic_xchg_tl, MO_TESL) TRANS(amswap_d, LAM, gen_am, tcg_gen_atomic_xchg_tl, MO_TEUQ) TRANS(amadd_w, LAM, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESL) diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode index 62f58cc541..678ce42038 100644 --- a/target/loongarch/insns.decode +++ b/target/loongarch/insns.decode @@ -261,6 +261,14 @@ ll_w0010 .. . . @rr_i14s2 sc_w0010 0001 .. . . @rr_i14s2 ll_d0010 0010 .. . . @rr_i14s2 sc_d0010 0011 .. . . @rr_i14s2 +amswap_b0011 1101 11000 . . .@rrr +amswap_h0011 1101 11001 . . .@rrr +amadd_b 0011 1101 11010 . . .@rrr +amadd_h 0011 1101 11011 . . .@rrr +amswap_db_b 0011 1101 11100 . . .@rrr +amswap_db_h 0011 1101 11101 . . .@rrr +amadd_db_b 0011 1101 0 . . .@rrr +amadd_db_h 0011 1101 1 . . .@rrr amswap_w0011 1110 0 . . .@rrr amswap_d0011 1110 1 . . .@rrr amadd_w 0011 1110 00010 . . .@rrr diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h index 195f53573a..0b230530e7 100644 --- a/target/loongarch/translate.h +++ b/target/loongarch/translate.h @@ -17,14 +17,15 @@ #define avail_ALL(C) true #define avail_64(C)(FIELD_EX32((C)->cpucfg1, CPUCFG1, ARCH) == \ CPUCFG1_ARCH_LA64) -#define avail_FP(C)(FIELD_EX32((C)->cpucfg2, CPUCFG2, FP)) -#define avail_FP_SP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_SP)) -#define avail_FP_DP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_DP)) -#define avail_LSPW(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSPW)) -#define avail_LAM(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, LAM)) -#define avail_LSX(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSX)) -#define avail_LASX(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, LASX)) -#define avail_IOCSR(C) (FIELD_EX32((C)->cpucfg1, CPUCFG1, IOCSR)) +#define avail_FP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP)) +#define avail_FP_SP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_SP)) +#define avail_FP_DP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_DP)) +#define avail_LSPW(C) (FI
[PATCH 5/5] target/loongarch: Add llacq/screl instructions
Add the following instructions in LoongArch v1.1: - llacq.w - screl.w - llacq.d - screl.d They are guarded by CPUCFG2.LLACQ_SCREL. Signed-off-by: Jiajie Chen --- target/loongarch/cpu.h| 1 + target/loongarch/disas.c | 4 .../loongarch/insn_trans/trans_atomic.c.inc | 20 +++ target/loongarch/insns.decode | 4 target/loongarch/translate.h | 3 +++ 5 files changed, 32 insertions(+) diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h index 8f938effa8..f0a63d5484 100644 --- a/target/loongarch/cpu.h +++ b/target/loongarch/cpu.h @@ -158,6 +158,7 @@ FIELD(CPUCFG2, LAM, 22, 1) FIELD(CPUCFG2, FRECIPE, 25, 1) FIELD(CPUCFG2, LAM_BH, 27, 1) FIELD(CPUCFG2, LAMCAS, 28, 1) +FIELD(CPUCFG2, LLACQ_SCREL, 29, 1) /* cpucfg[3] bits */ FIELD(CPUCFG3, CCDMA, 0, 1) diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c index 9eb49fb5e3..8e02f51ddc 100644 --- a/target/loongarch/disas.c +++ b/target/loongarch/disas.c @@ -579,6 +579,10 @@ INSN(fldx_s, frr) INSN(fldx_d, frr) INSN(fstx_s, frr) INSN(fstx_d, frr) +INSN(llacq_w, rr) +INSN(screl_w, rr) +INSN(llacq_d, rr) +INSN(screl_d, rr) INSN(amcas_b, rrr) INSN(amcas_h, rrr) INSN(amcas_w, rrr) diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc b/target/loongarch/insn_trans/trans_atomic.c.inc index bea567fdaf..0c81fbd745 100644 --- a/target/loongarch/insn_trans/trans_atomic.c.inc +++ b/target/loongarch/insn_trans/trans_atomic.c.inc @@ -17,6 +17,14 @@ static bool gen_ll(DisasContext *ctx, arg_rr_i *a, MemOp mop) return true; } +static bool gen_llacq(DisasContext *ctx, arg_rr *a, MemOp mop) +{ +arg_rr_i tmp_a = { +.rd = a->rd, .rj = a->rj, .imm = 0 +}; +return gen_ll(ctx, _a, mop); +} + static bool gen_sc(DisasContext *ctx, arg_rr_i *a, MemOp mop) { TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE); @@ -45,6 +53,14 @@ static bool gen_sc(DisasContext *ctx, arg_rr_i *a, MemOp mop) return true; } +static bool gen_screl(DisasContext *ctx, arg_rr *a, MemOp mop) +{ +arg_rr_i tmp_a = { +.rd = a->rd, .rj = a->rj, .imm = 0 +}; +return gen_sc(ctx, _a, mop); +} + static bool gen_cas(DisasContext *ctx, arg_rrr *a, void (*func)(TCGv, TCGv, TCGv, TCGv, TCGArg, MemOp), MemOp mop) @@ -89,6 +105,10 @@ TRANS(ll_w, ALL, gen_ll, MO_TESL) TRANS(sc_w, ALL, gen_sc, MO_TESL) TRANS(ll_d, 64, gen_ll, MO_TEUQ) TRANS(sc_d, 64, gen_sc, MO_TEUQ) +TRANS(llacq_w, LLACQ_SCREL, gen_llacq, MO_TESL) +TRANS(screl_w, LLACQ_SCREL, gen_screl, MO_TESL) +TRANS(llacq_d, LLACQ_SCREL_64, gen_llacq, MO_TEUQ) +TRANS(screl_d, LLACQ_SCREL_64, gen_screl, MO_TEUQ) TRANS(amcas_b, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESB) TRANS(amcas_h, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESW) TRANS(amcas_w, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESL) diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode index 92078f0f9f..e056d492d3 100644 --- a/target/loongarch/insns.decode +++ b/target/loongarch/insns.decode @@ -261,6 +261,10 @@ ll_w0010 .. . . @rr_i14s2 sc_w0010 0001 .. . . @rr_i14s2 ll_d0010 0010 .. . . @rr_i14s2 sc_d0010 0011 .. . . @rr_i14s2 +llacq_w 0011 1101 0 0 . .@rr +screl_w 0011 1101 0 1 . .@rr +llacq_d 0011 1101 0 00010 . .@rr +screl_d 0011 1101 0 00011 . .@rr amcas_b 0011 1101 1 . . .@rrr amcas_h 0011 1101 10001 . . .@rrr amcas_w 0011 1101 10010 . . .@rrr diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h index 651c5796ca..3d13d40ca6 100644 --- a/target/loongarch/translate.h +++ b/target/loongarch/translate.h @@ -34,6 +34,9 @@ #define avail_FRECIPE_LSX(C) (avail_FRECIPE(C) && avail_LSX(C)) #define avail_FRECIPE_LASX(C) (avail_FRECIPE(C) && avail_LASX(C)) +#define avail_LLACQ_SCREL(C)(FIELD_EX32((C)->cpucfg2, CPUCFG2, LLACQ_SCREL)) +#define avail_LLACQ_SCREL_64(C) (avail_64(C) && avail_LLACQ_SCREL(C)) + /* * If an operation is being performed on less than TARGET_LONG_BITS, * it may require the inputs to be sign- or zero-extended; which will -- 2.42.0
[PATCH] linux-user/elfload: Enable LSX/LASX in HWCAP for LoongArch
Since support for LSX and LASX is landed in QEMU recently, we can update HWCAPS accordingly. Signed-off-by: Jiajie Chen --- linux-user/elfload.c | 8 1 file changed, 8 insertions(+) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index db75cd4b33..f11f25309e 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -1237,6 +1237,14 @@ static uint32_t get_elf_hwcap(void) hwcaps |= HWCAP_LOONGARCH_LAM; } +if (FIELD_EX32(cpu->env.cpucfg[2], CPUCFG2, LSX)) { +hwcaps |= HWCAP_LOONGARCH_LSX; +} + +if (FIELD_EX32(cpu->env.cpucfg[2], CPUCFG2, LASX)) { +hwcaps |= HWCAP_LOONGARCH_LASX; +} + return hwcaps; } -- 2.41.0
Re: [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h
On 2023/9/17 06:01, Richard Henderson wrote: Signed-off-by: Richard Henderson --- tcg/loongarch64/tcg-target.h | 8 tcg/loongarch64/tcg-target.c.inc | 8 +--- 2 files changed, 5 insertions(+), 11 deletions(-) diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 03017672f6..1bea15b02e 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -29,6 +29,8 @@ #ifndef LOONGARCH_TCG_TARGET_H #define LOONGARCH_TCG_TARGET_H +#include "host/cpuinfo.h" + #define TCG_TARGET_INSN_UNIT_SIZE 4 #define TCG_TARGET_NB_REGS 64 @@ -85,8 +87,6 @@ typedef enum { TCG_VEC_TMP0 = TCG_REG_V23, } TCGReg; -extern bool use_lsx_instructions; - /* used for function call generation */ #define TCG_REG_CALL_STACK TCG_REG_SP #define TCG_TARGET_STACK_ALIGN 16 @@ -171,10 +171,10 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_muluh_i641 #define TCG_TARGET_HAS_mulsh_i641 -#define TCG_TARGET_HAS_qemu_ldst_i128 use_lsx_instructions +#define TCG_TARGET_HAS_qemu_ldst_i128 (cpuinfo & CPUINFO_LSX) #define TCG_TARGET_HAS_v64 0 -#define TCG_TARGET_HAS_v128 use_lsx_instructions +#define TCG_TARGET_HAS_v128 (cpuinfo & CPUINFO_LSX) #define TCG_TARGET_HAS_v256 0 #define TCG_TARGET_HAS_not_vec 1 diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 40074c46b8..52f2c26ce1 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -32,8 +32,6 @@ #include "../tcg-ldst.c.inc" #include -bool use_lsx_instructions; - #ifdef CONFIG_DEBUG_TCG static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { "zero", @@ -2316,10 +2314,6 @@ static void tcg_target_init(TCGContext *s) exit(EXIT_FAILURE); } -if (hwcap & HWCAP_LOONGARCH_LSX) { -use_lsx_instructions = 1; -} - tcg_target_available_regs[TCG_TYPE_I32] = ALL_GENERAL_REGS; tcg_target_available_regs[TCG_TYPE_I64] = ALL_GENERAL_REGS; @@ -2335,7 +2329,7 @@ static void tcg_target_init(TCGContext *s) tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S8); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S9); -if (use_lsx_instructions) { +if (cpuinfo & CPUINFO_LSX) { tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS; tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V24); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V25); Reviewed-by: Jiajie Chen
Re: [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128
On 2023/9/17 06:01, Richard Henderson wrote: Use new registers for the output, so that we never overlap the input address, which could happen for user-only. This avoids a "tmp = addr + 0" in that case. Signed-off-by: Richard Henderson --- tcg/loongarch64/tcg-target-con-set.h | 2 +- tcg/loongarch64/tcg-target.c.inc | 17 +++-- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 77d62e38e7..cae6c2aad6 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -38,4 +38,4 @@ C_O1_I2(w, w, wM) C_O1_I2(w, w, wA) C_O1_I3(w, w, w, w) C_O1_I4(r, rZ, rJ, rZ, rZ) -C_O2_I1(r, r, r) +C_N2_I1(r, r, r) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index b701df50db..40074c46b8 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1105,13 +1105,18 @@ static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg data_hi } } else { /* Otherwise use a pair of LD/ST. */ -tcg_out_opc_add_d(s, TCG_REG_TMP0, h.base, h.index); +TCGReg base = h.base; +if (h.index != TCG_REG_ZERO) { +base = TCG_REG_TMP0; +tcg_out_opc_add_d(s, base, h.base, h.index); +} if (is_ld) { -tcg_out_opc_ld_d(s, data_lo, TCG_REG_TMP0, 0); -tcg_out_opc_ld_d(s, data_hi, TCG_REG_TMP0, 8); +tcg_debug_assert(base != data_lo); +tcg_out_opc_ld_d(s, data_lo, base, 0); +tcg_out_opc_ld_d(s, data_hi, base, 8); } else { -tcg_out_opc_st_d(s, data_lo, TCG_REG_TMP0, 0); -tcg_out_opc_st_d(s, data_hi, TCG_REG_TMP0, 8); +tcg_out_opc_st_d(s, data_lo, base, 0); +tcg_out_opc_st_d(s, data_hi, base, 8); } } @@ -2049,7 +2054,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_qemu_ld_a32_i128: case INDEX_op_qemu_ld_a64_i128: -return C_O2_I1(r, r, r); +return C_N2_I1(r, r, r); case INDEX_op_qemu_st_a32_i128: case INDEX_op_qemu_st_a64_i128: Reviewed-by: Jiajie Chen
Re: [PATCH 3/7] util: Add cpuinfo for loongarch64
On 2023/9/17 06:01, Richard Henderson wrote: Signed-off-by: Richard Henderson --- host/include/loongarch64/host/cpuinfo.h | 21 +++ util/cpuinfo-loongarch.c| 35 + util/meson.build| 2 ++ 3 files changed, 58 insertions(+) create mode 100644 host/include/loongarch64/host/cpuinfo.h create mode 100644 util/cpuinfo-loongarch.c diff --git a/host/include/loongarch64/host/cpuinfo.h b/host/include/loongarch64/host/cpuinfo.h new file mode 100644 index 00..fab664a10b --- /dev/null +++ b/host/include/loongarch64/host/cpuinfo.h @@ -0,0 +1,21 @@ +/* + * SPDX-License-Identifier: GPL-2.0-or-later + * Host specific cpu identification for LoongArch + */ + +#ifndef HOST_CPUINFO_H +#define HOST_CPUINFO_H + +#define CPUINFO_ALWAYS (1u << 0) /* so cpuinfo is nonzero */ +#define CPUINFO_LSX (1u << 1) + +/* Initialized with a constructor. */ +extern unsigned cpuinfo; + +/* + * We cannot rely on constructor ordering, so other constructors must + * use the function interface rather than the variable above. + */ +unsigned cpuinfo_init(void); + +#endif /* HOST_CPUINFO_H */ diff --git a/util/cpuinfo-loongarch.c b/util/cpuinfo-loongarch.c new file mode 100644 index 00..08b6d7460c --- /dev/null +++ b/util/cpuinfo-loongarch.c @@ -0,0 +1,35 @@ +/* + * SPDX-License-Identifier: GPL-2.0-or-later + * Host specific cpu identification for LoongArch. + */ + +#include "qemu/osdep.h" +#include "host/cpuinfo.h" + +#ifdef CONFIG_GETAUXVAL +# include +#else +# include "elf.h" +#endif +#include + +unsigned cpuinfo; + +/* Called both as constructor and (possibly) via other constructors. */ +unsigned __attribute__((constructor)) cpuinfo_init(void) +{ +unsigned info = cpuinfo; +unsigned long hwcap; + +if (info) { +return info; +} + +hwcap = qemu_getauxval(AT_HWCAP); + +info = CPUINFO_ALWAYS; +info |= (hwcap & HWCAP_LOONGARCH_LSX ? CPUINFO_LSX : 0); + +cpuinfo = info; +return info; +} diff --git a/util/meson.build b/util/meson.build index c4827fd70a..b136f02aa0 100644 --- a/util/meson.build +++ b/util/meson.build @@ -112,6 +112,8 @@ if cpu == 'aarch64' util_ss.add(files('cpuinfo-aarch64.c')) elif cpu in ['x86', 'x86_64'] util_ss.add(files('cpuinfo-i386.c')) +elif cpu == 'loongarch64' + util_ss.add(files('cpuinfo-loongarch.c')) elif cpu in ['ppc', 'ppc64'] util_ss.add(files('cpuinfo-ppc.c')) endif Reviewed-by: Jiajie Chen
Re: [PATCH 1/7] tcg: Add C_N2_I1
On 2023/9/17 06:01, Richard Henderson wrote: Constraint with two outputs, both in new registers. Signed-off-by: Richard Henderson --- tcg/tcg.c | 5 + 1 file changed, 5 insertions(+) diff --git a/tcg/tcg.c b/tcg/tcg.c index 604fa9bf3e..fdbf79689a 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -644,6 +644,7 @@ static void tcg_out_movext3(TCGContext *s, const TCGMovExtend *i1, #define C_O1_I4(O1, I1, I2, I3, I4) C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4), #define C_N1_I2(O1, I1, I2) C_PFX3(c_n1_i2_, O1, I1, I2), +#define C_N2_I1(O1, O2, I1) C_PFX3(c_n2_i1_, O1, O2, I1), #define C_O2_I1(O1, O2, I1) C_PFX3(c_o2_i1_, O1, O2, I1), #define C_O2_I2(O1, O2, I1, I2) C_PFX4(c_o2_i2_, O1, O2, I1, I2), @@ -666,6 +667,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode); #undef C_O1_I3 #undef C_O1_I4 #undef C_N1_I2 +#undef C_N2_I1 #undef C_O2_I1 #undef C_O2_I2 #undef C_O2_I3 @@ -685,6 +687,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode); #define C_O1_I4(O1, I1, I2, I3, I4) { .args_ct_str = { #O1, #I1, #I2, #I3, #I4 } }, #define C_N1_I2(O1, I1, I2) { .args_ct_str = { "&" #O1, #I1, #I2 } }, +#define C_N2_I1(O1, O2, I1) { .args_ct_str = { "&" #O1, "&" #O2, #I1 } }, #define C_O2_I1(O1, O2, I1) { .args_ct_str = { #O1, #O2, #I1 } }, #define C_O2_I2(O1, O2, I1, I2) { .args_ct_str = { #O1, #O2, #I1, #I2 } }, @@ -706,6 +709,7 @@ static const TCGTargetOpDef constraint_sets[] = { #undef C_O1_I3 #undef C_O1_I4 #undef C_N1_I2 +#undef C_N2_I1 #undef C_O2_I1 #undef C_O2_I2 #undef C_O2_I3 @@ -725,6 +729,7 @@ static const TCGTargetOpDef constraint_sets[] = { #define C_O1_I4(O1, I1, I2, I3, I4) C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4) #define C_N1_I2(O1, I1, I2) C_PFX3(c_n1_i2_, O1, I1, I2) +#define C_N2_I1(O1, O2, I1) C_PFX3(c_n2_i1_, O1, O2, I1) #define C_O2_I1(O1, O2, I1) C_PFX3(c_o2_i1_, O1, O2, I1) #define C_O2_I2(O1, O2, I1, I2) C_PFX4(c_o2_i2_, O1, O2, I1, I2) Reviewed-by: Jiajie Chen
[PATCH] target/loongarch: fix ASXE flag conflict
HW_FLAGS_EUEN_ASXE acccidentally conflicts with HW_FLAGS_CRMD_PG, enabling LASX instructions even when CSR_EUEN.ASXE=0. Closes: https://gitlab.com/qemu-project/qemu/-/issues/1907 Signed-off-by: Jiajie Chen --- target/loongarch/cpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h index f125a8e49b..79ad79a289 100644 --- a/target/loongarch/cpu.h +++ b/target/loongarch/cpu.h @@ -462,7 +462,7 @@ static inline void set_pc(CPULoongArchState *env, uint64_t value) #define HW_FLAGS_CRMD_PGR_CSR_CRMD_PG_MASK /* 0x10 */ #define HW_FLAGS_EUEN_FPE 0x04 #define HW_FLAGS_EUEN_SXE 0x08 -#define HW_FLAGS_EUEN_ASXE 0x10 +#define HW_FLAGS_EUEN_ASXE 0x40 #define HW_FLAGS_VA32 0x20 static inline void cpu_get_tb_cpu_state(CPULoongArchState *env, vaddr *pc, -- 2.41.0
[PATCH v4 02/16] tcg/loongarch64: Lower basic tcg vec ops to LSX
LSX support on host cpu is detected via hwcap. Lower the following ops to LSX: - dup_vec - dupi_vec - dupm_vec - ld_vec - st_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target-con-set.h | 2 + tcg/loongarch64/tcg-target-con-str.h | 1 + tcg/loongarch64/tcg-target.c.inc | 219 ++- tcg/loongarch64/tcg-target.h | 38 - tcg/loongarch64/tcg-target.opc.h | 12 ++ 5 files changed, 270 insertions(+), 2 deletions(-) create mode 100644 tcg/loongarch64/tcg-target.opc.h diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index c2bde44613..37b3f80bf9 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -17,7 +17,9 @@ C_O0_I1(r) C_O0_I2(rZ, r) C_O0_I2(rZ, rZ) +C_O0_I2(w, r) C_O1_I1(r, r) +C_O1_I1(w, r) C_O1_I2(r, r, rC) C_O1_I2(r, r, ri) C_O1_I2(r, r, rI) diff --git a/tcg/loongarch64/tcg-target-con-str.h b/tcg/loongarch64/tcg-target-con-str.h index 6e9ccca3ad..81b8d40278 100644 --- a/tcg/loongarch64/tcg-target-con-str.h +++ b/tcg/loongarch64/tcg-target-con-str.h @@ -14,6 +14,7 @@ * REGS(letter, register_mask) */ REGS('r', ALL_GENERAL_REGS) +REGS('w', ALL_VECTOR_REGS) /* * Define constraint letters for constants: diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index baf5fc3819..150278e112 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -32,6 +32,8 @@ #include "../tcg-ldst.c.inc" #include +bool use_lsx_instructions; + #ifdef CONFIG_DEBUG_TCG static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { "zero", @@ -65,7 +67,39 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { "s5", "s6", "s7", -"s8" +"s8", +"vr0", +"vr1", +"vr2", +"vr3", +"vr4", +"vr5", +"vr6", +"vr7", +"vr8", +"vr9", +"vr10", +"vr11", +"vr12", +"vr13", +"vr14", +"vr15", +"vr16", +"vr17", +"vr18", +"vr19", +"vr20", +"vr21", +"vr22", +"vr23", +"vr24", +"vr25", +"vr26", +"vr27", +"vr28", +"vr29", +"vr30", +"vr31", }; #endif @@ -102,6 +136,15 @@ static const int tcg_target_reg_alloc_order[] = { TCG_REG_A2, TCG_REG_A1, TCG_REG_A0, + +/* Vector registers */ +TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3, +TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7, +TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11, +TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15, +TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, +TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, +/* V24 - V31 are caller-saved, and skipped. */ }; static const int tcg_target_call_iarg_regs[] = { @@ -135,6 +178,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot) #define TCG_CT_CONST_WSZ 0x2000 #define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 32) +#define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32) static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len) { @@ -1486,6 +1530,154 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, } } +static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece, +TCGReg rd, TCGReg rs) +{ +switch (vece) { +case MO_8: +tcg_out_opc_vreplgr2vr_b(s, rd, rs); +break; +case MO_16: +tcg_out_opc_vreplgr2vr_h(s, rd, rs); +break; +case MO_32: +tcg_out_opc_vreplgr2vr_w(s, rd, rs); +break; +case MO_64: +tcg_out_opc_vreplgr2vr_d(s, rd, rs); +break; +default: +g_assert_not_reached(); +} +return true; +} + +static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece, + TCGReg r, TCGReg base, intptr_t offset) +{ +/* Handle imm overflow and division (vldrepl.d imm is divided by 8) */ +if (offset < -0x800 || offset > 0x7ff || \ +(offset & ((1 << vece) - 1)) != 0) { +tcg_out_addi(s, TCG_TYPE_I64, TCG_REG_TMP0, base, offset); +base = TCG_REG_TMP0; +offset = 0; +} +offset >>= vece; + +switch (vece) { +case MO_8: +tcg_out_opc_vldrepl_b(s, r, base, offset); +break; +case MO_16: +tcg_out_opc_vldrepl_h(s, r, base, offset); +break; +case MO_32: +tcg_out_opc_vldrepl_w(s, r, base, offset); +break; +case
[PATCH v4 10/16] tcg/loongarch64: Lower vector saturated ops
Lower the following ops: - ssadd_vec - usadd_vec - sssub_vec - ussub_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 32 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index bdf22d8807..90c52c38cf 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1713,6 +1713,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn umax_vec_insn[4] = { OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU }; +static const LoongArchInsn ssadd_vec_insn[4] = { +OPC_VSADD_B, OPC_VSADD_H, OPC_VSADD_W, OPC_VSADD_D +}; +static const LoongArchInsn usadd_vec_insn[4] = { +OPC_VSADD_BU, OPC_VSADD_HU, OPC_VSADD_WU, OPC_VSADD_DU +}; +static const LoongArchInsn sssub_vec_insn[4] = { +OPC_VSSUB_B, OPC_VSSUB_H, OPC_VSSUB_W, OPC_VSSUB_D +}; +static const LoongArchInsn ussub_vec_insn[4] = { +OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU +}; a0 = args[0]; a1 = args[1]; @@ -1829,6 +1841,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_umax_vec: tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_ssadd_vec: +tcg_out32(s, encode_vdvjvk_insn(ssadd_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_usadd_vec: +tcg_out32(s, encode_vdvjvk_insn(usadd_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sssub_vec: +tcg_out32(s, encode_vdvjvk_insn(sssub_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_ussub_vec: +tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1860,6 +1884,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_smax_vec: case INDEX_op_umin_vec: case INDEX_op_umax_vec: +case INDEX_op_ssadd_vec: +case INDEX_op_usadd_vec: +case INDEX_op_sssub_vec: +case INDEX_op_ussub_vec: return 1; default: return 0; @@ -2039,6 +2067,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_smax_vec: case INDEX_op_umin_vec: case INDEX_op_umax_vec: +case INDEX_op_ssadd_vec: +case INDEX_op_usadd_vec: +case INDEX_op_sssub_vec: +case INDEX_op_ussub_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index ec725aaeaa..fa14558275 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -192,7 +192,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 0 -#define TCG_TARGET_HAS_sat_vec 0 +#define TCG_TARGET_HAS_sat_vec 1 #define TCG_TARGET_HAS_minmax_vec 1 #define TCG_TARGET_HAS_bitsel_vec 0 #define TCG_TARGET_HAS_cmpsel_vec 0 -- 2.42.0
[PATCH v4 12/16] tcg/loongarch64: Lower bitsel_vec to vbitsel
Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target.c.inc | 11 ++- tcg/loongarch64/tcg-target.h | 2 +- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 3f530ad4d8..914572d21b 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -35,4 +35,5 @@ C_O1_I2(r, rZ, rZ) C_O1_I2(w, w, w) C_O1_I2(w, w, wM) C_O1_I2(w, w, wA) +C_O1_I3(w, w, w, w) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 6958fd219c..a33ec594ee 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1676,7 +1676,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, const int const_args[TCG_MAX_OP_ARGS]) { TCGType type = vecl + TCG_TYPE_V64; -TCGArg a0, a1, a2; +TCGArg a0, a1, a2, a3; TCGReg temp = TCG_REG_TMP0; TCGReg temp_vec = TCG_VEC_TMP0; @@ -1738,6 +1738,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, a0 = args[0]; a1 = args[1]; a2 = args[2]; +a3 = args[3]; /* Currently only supports V128 */ tcg_debug_assert(type == TCG_TYPE_V128); @@ -1871,6 +1872,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_sarv_vec: tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_bitsel_vec: +/* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */ +tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1909,6 +1914,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_shlv_vec: case INDEX_op_shrv_vec: case INDEX_op_sarv_vec: +case INDEX_op_bitsel_vec: return 1; default: return 0; @@ -2101,6 +2107,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_neg_vec: return C_O1_I1(w, w); +case INDEX_op_bitsel_vec: +return C_O1_I3(w, w, w, w); + default: g_assert_not_reached(); } diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 7e9fb61c47..bc56939a57 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -194,7 +194,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_rotv_vec 0 #define TCG_TARGET_HAS_sat_vec 1 #define TCG_TARGET_HAS_minmax_vec 1 -#define TCG_TARGET_HAS_bitsel_vec 0 +#define TCG_TARGET_HAS_bitsel_vec 1 #define TCG_TARGET_HAS_cmpsel_vec 0 #define TCG_TARGET_DEFAULT_MO (0) -- 2.42.0
[PATCH v4 03/16] tcg: pass vece to tcg_target_const_match()
Pass vece to tcg_target_const_match() to allow correct interpretation of const args of vector ops. Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/aarch64/tcg-target.c.inc | 2 +- tcg/arm/tcg-target.c.inc | 2 +- tcg/i386/tcg-target.c.inc| 2 +- tcg/loongarch64/tcg-target.c.inc | 2 +- tcg/mips/tcg-target.c.inc| 2 +- tcg/ppc/tcg-target.c.inc | 2 +- tcg/riscv/tcg-target.c.inc | 2 +- tcg/s390x/tcg-target.c.inc | 2 +- tcg/sparc64/tcg-target.c.inc | 2 +- tcg/tcg.c| 4 ++-- tcg/tci/tcg-target.c.inc | 2 +- 11 files changed, 12 insertions(+), 12 deletions(-) diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc index 0931a69448..a1e2b6be16 100644 --- a/tcg/aarch64/tcg-target.c.inc +++ b/tcg/aarch64/tcg-target.c.inc @@ -272,7 +272,7 @@ static bool is_shimm1632(uint32_t v32, int *cmode, int *imm8) } } -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc index acb5f23b54..76f1345002 100644 --- a/tcg/arm/tcg-target.c.inc +++ b/tcg/arm/tcg-target.c.inc @@ -509,7 +509,7 @@ static bool is_shimm1632(uint32_t v32, int *cmode, int *imm8) * mov operand2: values represented with x << (2 * y), x < 0x100 * add, sub, eor...: ditto */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc index 0c3d1e4cef..aed91e515e 100644 --- a/tcg/i386/tcg-target.c.inc +++ b/tcg/i386/tcg-target.c.inc @@ -198,7 +198,7 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type, } /* test if a constant matches the constraint */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 150278e112..07a0326e5d 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -186,7 +186,7 @@ static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len) } /* test if a constant matches the constraint */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return true; diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc index 9faa8bdf0b..c6662889f0 100644 --- a/tcg/mips/tcg-target.c.inc +++ b/tcg/mips/tcg-target.c.inc @@ -190,7 +190,7 @@ static bool is_p2m1(tcg_target_long val) } /* test if a constant matches the constraint */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc index 090f11e71c..ccf245191d 100644 --- a/tcg/ppc/tcg-target.c.inc +++ b/tcg/ppc/tcg-target.c.inc @@ -261,7 +261,7 @@ static bool reloc_pc14(tcg_insn_unit *src_rw, const tcg_insn_unit *target) } /* test if a constant matches the constraint */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc index 9be81c1b7b..3bd7959e7e 100644 --- a/tcg/riscv/tcg-target.c.inc +++ b/tcg/riscv/tcg-target.c.inc @@ -145,7 +145,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot) #define sextreg sextract64 /* test if a constant matches the constraint */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc index ecd8aaf2a1..f4d3abcb71 100644 --- a/tcg/s390x/tcg-target.c.inc +++ b/tcg/s390x/tcg-target.c.inc @@ -540,7 +540,7 @@ static bool risbg_mask(uint64_t c) } /* Test if a constant matches the constraint. */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc index 81a08bb6c5..6b9be4c520 100644 --- a/tcg/sparc64/tcg-target.c.inc +++ b/tcg/
[PATCH v4 15/16] tcg/loongarch64: Lower rotli_vec to vrotri
Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 21 + tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 8f448823b0..82901d678a 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1902,6 +1902,26 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1, temp_vec)); break; +case INDEX_op_rotli_vec: +/* rotli_vec a1, a2 = rotri_vec a1, -a2 */ +a2 = extract32(-a2, 0, 3 + vece); +switch (vece) { +case MO_8: +tcg_out_opc_vrotri_b(s, a0, a1, a2); +break; +case MO_16: +tcg_out_opc_vrotri_h(s, a0, a1, a2); +break; +case MO_32: +tcg_out_opc_vrotri_w(s, a0, a1, a2); +break; +case MO_64: +tcg_out_opc_vrotri_d(s, a0, a1, a2); +break; +default: +g_assert_not_reached(); +} +break; case INDEX_op_bitsel_vec: /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */ tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1); @@ -2140,6 +2160,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_shli_vec: case INDEX_op_shri_vec: case INDEX_op_sari_vec: +case INDEX_op_rotli_vec: return C_O1_I1(w, w); case INDEX_op_bitsel_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index d5c69bc192..67b0a95532 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -189,7 +189,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_shi_vec 1 #define TCG_TARGET_HAS_shs_vec 0 #define TCG_TARGET_HAS_shv_vec 1 -#define TCG_TARGET_HAS_roti_vec 0 +#define TCG_TARGET_HAS_roti_vec 1 #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 1 #define TCG_TARGET_HAS_sat_vec 1 -- 2.42.0
[PATCH v4 13/16] tcg/loongarch64: Lower vector shift integer ops
Lower the following ops: - shli_vec - shrv_vec - sarv_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 21 + tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index a33ec594ee..c21c917083 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1734,6 +1734,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn sarv_vec_insn[4] = { OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D }; +static const LoongArchInsn shli_vec_insn[4] = { +OPC_VSLLI_B, OPC_VSLLI_H, OPC_VSLLI_W, OPC_VSLLI_D +}; +static const LoongArchInsn shri_vec_insn[4] = { +OPC_VSRLI_B, OPC_VSRLI_H, OPC_VSRLI_W, OPC_VSRLI_D +}; +static const LoongArchInsn sari_vec_insn[4] = { +OPC_VSRAI_B, OPC_VSRAI_H, OPC_VSRAI_W, OPC_VSRAI_D +}; a0 = args[0]; a1 = args[1]; @@ -1872,6 +1881,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_sarv_vec: tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_shli_vec: +tcg_out32(s, encode_vdvjuk3_insn(shli_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_shri_vec: +tcg_out32(s, encode_vdvjuk3_insn(shri_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sari_vec: +tcg_out32(s, encode_vdvjuk3_insn(sari_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_bitsel_vec: /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */ tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1); @@ -2105,6 +2123,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_not_vec: case INDEX_op_neg_vec: +case INDEX_op_shli_vec: +case INDEX_op_shri_vec: +case INDEX_op_sari_vec: return C_O1_I1(w, w); case INDEX_op_bitsel_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index bc56939a57..d7b806e252 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -186,7 +186,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_nor_vec 1 #define TCG_TARGET_HAS_eqv_vec 0 #define TCG_TARGET_HAS_mul_vec 1 -#define TCG_TARGET_HAS_shi_vec 0 +#define TCG_TARGET_HAS_shi_vec 1 #define TCG_TARGET_HAS_shs_vec 0 #define TCG_TARGET_HAS_shv_vec 1 #define TCG_TARGET_HAS_roti_vec 0 -- 2.42.0
[PATCH v4 04/16] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt
Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target-con-str.h | 1 + tcg/loongarch64/tcg-target.c.inc | 65 3 files changed, 67 insertions(+) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 37b3f80bf9..8c8ea5d919 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -31,4 +31,5 @@ C_O1_I2(r, 0, rZ) C_O1_I2(r, rZ, ri) C_O1_I2(r, rZ, rJ) C_O1_I2(r, rZ, rZ) +C_O1_I2(w, w, wM) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target-con-str.h b/tcg/loongarch64/tcg-target-con-str.h index 81b8d40278..a8a1c44014 100644 --- a/tcg/loongarch64/tcg-target-con-str.h +++ b/tcg/loongarch64/tcg-target-con-str.h @@ -26,3 +26,4 @@ CONST('U', TCG_CT_CONST_U12) CONST('Z', TCG_CT_CONST_ZERO) CONST('C', TCG_CT_CONST_C12) CONST('W', TCG_CT_CONST_WSZ) +CONST('M', TCG_CT_CONST_VCMP) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 07a0326e5d..129dd92910 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -176,6 +176,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot) #define TCG_CT_CONST_U12 0x800 #define TCG_CT_CONST_C12 0x1000 #define TCG_CT_CONST_WSZ 0x2000 +#define TCG_CT_CONST_VCMP 0x4000 #define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 32) #define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32) @@ -209,6 +210,10 @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) if ((ct & TCG_CT_CONST_WSZ) && val == (type == TCG_TYPE_I32 ? 32 : 64)) { return true; } +int64_t vec_val = sextract64(val, 0, 8 << vece); +if ((ct & TCG_CT_CONST_VCMP) && -0x10 <= vec_val && vec_val <= 0x1f) { +return true; +} return false; } @@ -1624,6 +1629,23 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, TCGType type = vecl + TCG_TYPE_V64; TCGArg a0, a1, a2; TCGReg temp = TCG_REG_TMP0; +TCGReg temp_vec = TCG_VEC_TMP0; + +static const LoongArchInsn cmp_vec_insn[16][4] = { +[TCG_COND_EQ] = {OPC_VSEQ_B, OPC_VSEQ_H, OPC_VSEQ_W, OPC_VSEQ_D}, +[TCG_COND_LE] = {OPC_VSLE_B, OPC_VSLE_H, OPC_VSLE_W, OPC_VSLE_D}, +[TCG_COND_LEU] = {OPC_VSLE_BU, OPC_VSLE_HU, OPC_VSLE_WU, OPC_VSLE_DU}, +[TCG_COND_LT] = {OPC_VSLT_B, OPC_VSLT_H, OPC_VSLT_W, OPC_VSLT_D}, +[TCG_COND_LTU] = {OPC_VSLT_BU, OPC_VSLT_HU, OPC_VSLT_WU, OPC_VSLT_DU}, +}; +static const LoongArchInsn cmp_vec_imm_insn[16][4] = { +[TCG_COND_EQ] = {OPC_VSEQI_B, OPC_VSEQI_H, OPC_VSEQI_W, OPC_VSEQI_D}, +[TCG_COND_LE] = {OPC_VSLEI_B, OPC_VSLEI_H, OPC_VSLEI_W, OPC_VSLEI_D}, +[TCG_COND_LEU] = {OPC_VSLEI_BU, OPC_VSLEI_HU, OPC_VSLEI_WU, OPC_VSLEI_DU}, +[TCG_COND_LT] = {OPC_VSLTI_B, OPC_VSLTI_H, OPC_VSLTI_W, OPC_VSLTI_D}, +[TCG_COND_LTU] = {OPC_VSLTI_BU, OPC_VSLTI_HU, OPC_VSLTI_WU, OPC_VSLTI_DU}, +}; +LoongArchInsn insn; a0 = args[0]; a1 = args[1]; @@ -1651,6 +1673,45 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, tcg_out_opc_vldx(s, a0, a1, temp); } break; +case INDEX_op_cmp_vec: +TCGCond cond = args[3]; +if (const_args[2]) { +/* + * cmp_vec dest, src, value + * Try vseqi/vslei/vslti + */ +int64_t value = sextract64(a2, 0, 8 << vece); +if ((cond == TCG_COND_EQ || cond == TCG_COND_LE || \ + cond == TCG_COND_LT) && (-0x10 <= value && value <= 0x0f)) { +tcg_out32(s, encode_vdvjsk5_insn(cmp_vec_imm_insn[cond][vece], \ + a0, a1, value)); +break; +} else if ((cond == TCG_COND_LEU || cond == TCG_COND_LTU) && +(0x00 <= value && value <= 0x1f)) { +tcg_out32(s, encode_vdvjuk5_insn(cmp_vec_imm_insn[cond][vece], \ + a0, a1, value)); +break; +} + +/* + * Fallback to: + * dupi_vec temp, a2 + * cmp_vec a0, a1, temp, cond + */ +tcg_out_dupi_vec(s, type, vece, temp_vec, a2); +a2 = temp_vec; +} + +insn = cmp_vec_insn[cond][vece]; +if (insn == 0) { +TCGArg t; +t = a1, a1 = a2, a2 = t; +cond = tcg_swap_cond(cond); +insn = cmp_vec_insn[cond][vece]; +tcg_debug_assert(insn != 0); +} +tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1666,6 +1727,7 @@ int tcg_can
[PATCH v4 08/16] tcg/loongarch64: Lower mul_vec to vmul
Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 8 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index b36b706e39..0814f62905 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1698,6 +1698,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn neg_vec_insn[4] = { OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D }; +static const LoongArchInsn mul_vec_insn[4] = { +OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D +}; a0 = args[0]; a1 = args[1]; @@ -1799,6 +1802,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_neg_vec: tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1)); break; +case INDEX_op_mul_vec: +tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1825,6 +1831,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_nor_vec: case INDEX_op_not_vec: case INDEX_op_neg_vec: +case INDEX_op_mul_vec: return 1; default: return 0; @@ -1999,6 +2006,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_orc_vec: case INDEX_op_xor_vec: case INDEX_op_nor_vec: +case INDEX_op_mul_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 64c72d0857..2c2266ed31 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -185,7 +185,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_nand_vec 0 #define TCG_TARGET_HAS_nor_vec 1 #define TCG_TARGET_HAS_eqv_vec 0 -#define TCG_TARGET_HAS_mul_vec 0 +#define TCG_TARGET_HAS_mul_vec 1 #define TCG_TARGET_HAS_shi_vec 0 #define TCG_TARGET_HAS_shs_vec 0 #define TCG_TARGET_HAS_shv_vec 0 -- 2.42.0
[PATCH v4 06/16] tcg/loongarch64: Lower vector bitwise operations
Lower the following ops: - and_vec - andc_vec - or_vec - orc_vec - xor_vec - nor_vec - not_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target-con-set.h | 2 ++ tcg/loongarch64/tcg-target.c.inc | 44 tcg/loongarch64/tcg-target.h | 8 ++--- 3 files changed, 50 insertions(+), 4 deletions(-) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 2d5dce75c3..3f530ad4d8 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -20,6 +20,7 @@ C_O0_I2(rZ, rZ) C_O0_I2(w, r) C_O1_I1(r, r) C_O1_I1(w, r) +C_O1_I1(w, w) C_O1_I2(r, r, rC) C_O1_I2(r, r, ri) C_O1_I2(r, r, rI) @@ -31,6 +32,7 @@ C_O1_I2(r, 0, rZ) C_O1_I2(r, rZ, ri) C_O1_I2(r, rZ, rJ) C_O1_I2(r, rZ, rZ) +C_O1_I2(w, w, w) C_O1_I2(w, w, wM) C_O1_I2(w, w, wA) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 1a369b237c..d569e443dd 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1722,6 +1722,32 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, tcg_out_opc_vldx(s, a0, a1, temp); } break; +case INDEX_op_and_vec: +tcg_out_opc_vand_v(s, a0, a1, a2); +break; +case INDEX_op_andc_vec: +/* + * vandn vd, vj, vk: vd = vk & ~vj + * andc_vec vd, vj, vk: vd = vj & ~vk + * vk and vk are swapped + */ +tcg_out_opc_vandn_v(s, a0, a2, a1); +break; +case INDEX_op_or_vec: +tcg_out_opc_vor_v(s, a0, a1, a2); +break; +case INDEX_op_orc_vec: +tcg_out_opc_vorn_v(s, a0, a1, a2); +break; +case INDEX_op_xor_vec: +tcg_out_opc_vxor_v(s, a0, a1, a2); +break; +case INDEX_op_nor_vec: +tcg_out_opc_vnor_v(s, a0, a1, a2); +break; +case INDEX_op_not_vec: +tcg_out_opc_vnor_v(s, a0, a1, a1); +break; case INDEX_op_cmp_vec: TCGCond cond = args[3]; if (const_args[2]) { @@ -1785,6 +1811,13 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_cmp_vec: case INDEX_op_add_vec: case INDEX_op_sub_vec: +case INDEX_op_and_vec: +case INDEX_op_andc_vec: +case INDEX_op_or_vec: +case INDEX_op_orc_vec: +case INDEX_op_xor_vec: +case INDEX_op_nor_vec: +case INDEX_op_not_vec: return 1; default: return 0; @@ -1953,6 +1986,17 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_sub_vec: return C_O1_I2(w, w, wA); +case INDEX_op_and_vec: +case INDEX_op_andc_vec: +case INDEX_op_or_vec: +case INDEX_op_orc_vec: +case INDEX_op_xor_vec: +case INDEX_op_nor_vec: +return C_O1_I2(w, w, w); + +case INDEX_op_not_vec: +return C_O1_I1(w, w); + default: g_assert_not_reached(); } diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index daaf38ee31..f9c5cb12ca 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -177,13 +177,13 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_v128 use_lsx_instructions #define TCG_TARGET_HAS_v256 0 -#define TCG_TARGET_HAS_not_vec 0 +#define TCG_TARGET_HAS_not_vec 1 #define TCG_TARGET_HAS_neg_vec 0 #define TCG_TARGET_HAS_abs_vec 0 -#define TCG_TARGET_HAS_andc_vec 0 -#define TCG_TARGET_HAS_orc_vec 0 +#define TCG_TARGET_HAS_andc_vec 1 +#define TCG_TARGET_HAS_orc_vec 1 #define TCG_TARGET_HAS_nand_vec 0 -#define TCG_TARGET_HAS_nor_vec 0 +#define TCG_TARGET_HAS_nor_vec 1 #define TCG_TARGET_HAS_eqv_vec 0 #define TCG_TARGET_HAS_mul_vec 0 #define TCG_TARGET_HAS_shi_vec 0 -- 2.42.0
[PATCH v4 11/16] tcg/loongarch64: Lower vector shift vector ops
Lower the following ops: - shlv_vec - shrv_vec - sarv_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 24 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 90c52c38cf..6958fd219c 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1725,6 +1725,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn ussub_vec_insn[4] = { OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU }; +static const LoongArchInsn shlv_vec_insn[4] = { +OPC_VSLL_B, OPC_VSLL_H, OPC_VSLL_W, OPC_VSLL_D +}; +static const LoongArchInsn shrv_vec_insn[4] = { +OPC_VSRL_B, OPC_VSRL_H, OPC_VSRL_W, OPC_VSRL_D +}; +static const LoongArchInsn sarv_vec_insn[4] = { +OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D +}; a0 = args[0]; a1 = args[1]; @@ -1853,6 +1862,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_ussub_vec: tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_shlv_vec: +tcg_out32(s, encode_vdvjvk_insn(shlv_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_shrv_vec: +tcg_out32(s, encode_vdvjvk_insn(shrv_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sarv_vec: +tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1888,6 +1906,9 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_usadd_vec: case INDEX_op_sssub_vec: case INDEX_op_ussub_vec: +case INDEX_op_shlv_vec: +case INDEX_op_shrv_vec: +case INDEX_op_sarv_vec: return 1; default: return 0; @@ -2071,6 +2092,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_usadd_vec: case INDEX_op_sssub_vec: case INDEX_op_ussub_vec: +case INDEX_op_shlv_vec: +case INDEX_op_shrv_vec: +case INDEX_op_sarv_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index fa14558275..7e9fb61c47 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -188,7 +188,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_mul_vec 1 #define TCG_TARGET_HAS_shi_vec 0 #define TCG_TARGET_HAS_shs_vec 0 -#define TCG_TARGET_HAS_shv_vec 0 +#define TCG_TARGET_HAS_shv_vec 1 #define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 0 -- 2.42.0
[PATCH v4 00/16] Lower TCG vector ops to LSX
This patch series allows qemu to utilize LSX instructions on LoongArch machines to execute TCG vector ops. Passed tcg tests with x86_64 and aarch64 cross compilers. Changes since v3: - Refactor add/sub_vec handling code to use a helper function - Only use vldx/vstx for MO_128 load/store, otherwise fallback to two ld/st Changes since v2: - Add vece argument to tcg_target_const_match() for const args of vector ops - Use custom constraint for cmp_vec/add_vec/sub_vec for better const arg handling - Implement 128-bit load & store using vldx/vstx Changes since v1: - Optimize dupi_vec/st_vec/ld_vec/cmp_vec/add_vec/sub_vec generation - Lower not_vec/shi_vec/roti_vec/rotv_vec Jiajie Chen (16): tcg/loongarch64: Import LSX instructions tcg/loongarch64: Lower basic tcg vec ops to LSX tcg: pass vece to tcg_target_const_match() tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt tcg/loongarch64: Lower add/sub_vec to vadd/vsub tcg/loongarch64: Lower vector bitwise operations tcg/loongarch64: Lower neg_vec to vneg tcg/loongarch64: Lower mul_vec to vmul tcg/loongarch64: Lower vector min max ops tcg/loongarch64: Lower vector saturated ops tcg/loongarch64: Lower vector shift vector ops tcg/loongarch64: Lower bitsel_vec to vbitsel tcg/loongarch64: Lower vector shift integer ops tcg/loongarch64: Lower rotv_vec ops to LSX tcg/loongarch64: Lower rotli_vec to vrotri tcg/loongarch64: Implement 128-bit load & store tcg/aarch64/tcg-target.c.inc |2 +- tcg/arm/tcg-target.c.inc |2 +- tcg/i386/tcg-target.c.inc|2 +- tcg/loongarch64/tcg-insn-defs.c.inc | 6251 +- tcg/loongarch64/tcg-target-con-set.h |9 + tcg/loongarch64/tcg-target-con-str.h |3 + tcg/loongarch64/tcg-target.c.inc | 619 ++- tcg/loongarch64/tcg-target.h | 40 +- tcg/loongarch64/tcg-target.opc.h | 12 + tcg/mips/tcg-target.c.inc|2 +- tcg/ppc/tcg-target.c.inc |2 +- tcg/riscv/tcg-target.c.inc |2 +- tcg/s390x/tcg-target.c.inc |2 +- tcg/sparc64/tcg-target.c.inc |2 +- tcg/tcg.c|4 +- tcg/tci/tcg-target.c.inc |2 +- 16 files changed, 6824 insertions(+), 132 deletions(-) create mode 100644 tcg/loongarch64/tcg-target.opc.h -- 2.42.0
[PATCH v4 14/16] tcg/loongarch64: Lower rotv_vec ops to LSX
Lower the following ops: - rotrv_vec - rotlv_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 14 ++ tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index c21c917083..8f448823b0 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1743,6 +1743,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn sari_vec_insn[4] = { OPC_VSRAI_B, OPC_VSRAI_H, OPC_VSRAI_W, OPC_VSRAI_D }; +static const LoongArchInsn rotrv_vec_insn[4] = { +OPC_VROTR_B, OPC_VROTR_H, OPC_VROTR_W, OPC_VROTR_D +}; a0 = args[0]; a1 = args[1]; @@ -1890,6 +1893,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_sari_vec: tcg_out32(s, encode_vdvjuk3_insn(sari_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_rotrv_vec: +tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_rotlv_vec: +/* rotlv_vec a1, a2 = rotrv_vec a1, -a2 */ +tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], temp_vec, a2)); +tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1, +temp_vec)); +break; case INDEX_op_bitsel_vec: /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */ tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1); @@ -2119,6 +2131,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_shlv_vec: case INDEX_op_shrv_vec: case INDEX_op_sarv_vec: +case INDEX_op_rotrv_vec: +case INDEX_op_rotlv_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index d7b806e252..d5c69bc192 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -191,7 +191,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_shv_vec 1 #define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 0 -#define TCG_TARGET_HAS_rotv_vec 0 +#define TCG_TARGET_HAS_rotv_vec 1 #define TCG_TARGET_HAS_sat_vec 1 #define TCG_TARGET_HAS_minmax_vec 1 #define TCG_TARGET_HAS_bitsel_vec 1 -- 2.42.0
[PATCH v4 16/16] tcg/loongarch64: Implement 128-bit load & store
If LSX is available, use LSX instructions to implement 128-bit load & store when MO_128 is required, otherwise use two 64-bit loads & stores. Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 2 + tcg/loongarch64/tcg-target.c.inc | 59 tcg/loongarch64/tcg-target.h | 2 +- 3 files changed, 62 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 914572d21b..77d62e38e7 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -18,6 +18,7 @@ C_O0_I1(r) C_O0_I2(rZ, r) C_O0_I2(rZ, rZ) C_O0_I2(w, r) +C_O0_I3(r, r, r) C_O1_I1(r, r) C_O1_I1(w, r) C_O1_I1(w, w) @@ -37,3 +38,4 @@ C_O1_I2(w, w, wM) C_O1_I2(w, w, wA) C_O1_I3(w, w, w, w) C_O1_I4(r, rZ, rJ, rZ, rZ) +C_O2_I1(r, r, r) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 82901d678a..6e9f334fed 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1081,6 +1081,48 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg, } } +static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg data_hi, + TCGReg addr_reg, MemOpIdx oi, bool is_ld) +{ +TCGLabelQemuLdst *ldst; +HostAddress h; + +ldst = prepare_host_addr(s, , addr_reg, oi, true); + +if (h.aa.atom == MO_128) { +/* + * Use VLDX/VSTX when 128-bit atomicity is required. + * If address is aligned to 16-bytes, the 128-bit load/store is atomic. + */ +if (is_ld) { +tcg_out_opc_vldx(s, TCG_VEC_TMP0, h.base, h.index); +tcg_out_opc_vpickve2gr_d(s, data_lo, TCG_VEC_TMP0, 0); +tcg_out_opc_vpickve2gr_d(s, data_hi, TCG_VEC_TMP0, 1); +} else { +tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_lo, 0); +tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_hi, 1); +tcg_out_opc_vstx(s, TCG_VEC_TMP0, h.base, h.index); +} +} else { +/* otherwise use a pair of LD/ST */ +tcg_out_opc_add_d(s, TCG_REG_TMP0, h.base, h.index); +if (is_ld) { +tcg_out_opc_ld_d(s, data_lo, TCG_REG_TMP0, 0); +tcg_out_opc_ld_d(s, data_hi, TCG_REG_TMP0, 8); +} else { +tcg_out_opc_st_d(s, data_lo, TCG_REG_TMP0, 0); +tcg_out_opc_st_d(s, data_hi, TCG_REG_TMP0, 8); +} +} + +if (ldst) { +ldst->type = TCG_TYPE_I128; +ldst->datalo_reg = data_lo; +ldst->datahi_reg = data_hi; +ldst->raddr = tcg_splitwx_to_rx(s->code_ptr); +} +} + /* * Entry-points */ @@ -1145,6 +1187,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, TCGArg a0 = args[0]; TCGArg a1 = args[1]; TCGArg a2 = args[2]; +TCGArg a3 = args[3]; int c2 = const_args[2]; switch (opc) { @@ -1507,6 +1550,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, case INDEX_op_qemu_ld_a64_i64: tcg_out_qemu_ld(s, a0, a1, a2, TCG_TYPE_I64); break; +case INDEX_op_qemu_ld_a32_i128: +case INDEX_op_qemu_ld_a64_i128: +tcg_out_qemu_ldst_i128(s, a0, a1, a2, a3, true); +break; case INDEX_op_qemu_st_a32_i32: case INDEX_op_qemu_st_a64_i32: tcg_out_qemu_st(s, a0, a1, a2, TCG_TYPE_I32); @@ -1515,6 +1562,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, case INDEX_op_qemu_st_a64_i64: tcg_out_qemu_st(s, a0, a1, a2, TCG_TYPE_I64); break; +case INDEX_op_qemu_st_a32_i128: +case INDEX_op_qemu_st_a64_i128: +tcg_out_qemu_ldst_i128(s, a0, a1, a2, a3, false); +break; case INDEX_op_mov_i32: /* Always emitted via tcg_out_mov. */ case INDEX_op_mov_i64: @@ -1996,6 +2047,14 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_qemu_st_a64_i64: return C_O0_I2(rZ, r); +case INDEX_op_qemu_ld_a32_i128: +case INDEX_op_qemu_ld_a64_i128: +return C_O2_I1(r, r, r); + +case INDEX_op_qemu_st_a32_i128: +case INDEX_op_qemu_st_a64_i128: +return C_O0_I3(r, r, r); + case INDEX_op_brcond_i32: case INDEX_op_brcond_i64: return C_O0_I2(rZ, rZ); diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 67b0a95532..03017672f6 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -171,7 +171,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_muluh_i641 #define TCG_TARGET_HAS_mulsh_i641 -#define TCG_TARGET_HAS_qemu_ldst_i128 0 +#define TCG_TARGET_HAS_qemu_ldst_i128 use_lsx_instructions #define TCG_TARGET_HAS_v64 0 #define TCG_TARGET_HAS_v128 use_lsx_instructions -- 2.42.0
[PATCH v4 09/16] tcg/loongarch64: Lower vector min max ops
Lower the following ops: - smin_vec - smax_vec - umin_vec - umax_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 32 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 0814f62905..bdf22d8807 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1701,6 +1701,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn mul_vec_insn[4] = { OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D }; +static const LoongArchInsn smin_vec_insn[4] = { +OPC_VMIN_B, OPC_VMIN_H, OPC_VMIN_W, OPC_VMIN_D +}; +static const LoongArchInsn umin_vec_insn[4] = { +OPC_VMIN_BU, OPC_VMIN_HU, OPC_VMIN_WU, OPC_VMIN_DU +}; +static const LoongArchInsn smax_vec_insn[4] = { +OPC_VMAX_B, OPC_VMAX_H, OPC_VMAX_W, OPC_VMAX_D +}; +static const LoongArchInsn umax_vec_insn[4] = { +OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU +}; a0 = args[0]; a1 = args[1]; @@ -1805,6 +1817,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_mul_vec: tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_smin_vec: +tcg_out32(s, encode_vdvjvk_insn(smin_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_smax_vec: +tcg_out32(s, encode_vdvjvk_insn(smax_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_umin_vec: +tcg_out32(s, encode_vdvjvk_insn(umin_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_umax_vec: +tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1832,6 +1856,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_not_vec: case INDEX_op_neg_vec: case INDEX_op_mul_vec: +case INDEX_op_smin_vec: +case INDEX_op_smax_vec: +case INDEX_op_umin_vec: +case INDEX_op_umax_vec: return 1; default: return 0; @@ -2007,6 +2035,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_xor_vec: case INDEX_op_nor_vec: case INDEX_op_mul_vec: +case INDEX_op_smin_vec: +case INDEX_op_smax_vec: +case INDEX_op_umin_vec: +case INDEX_op_umax_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 2c2266ed31..ec725aaeaa 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -193,7 +193,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 0 #define TCG_TARGET_HAS_sat_vec 0 -#define TCG_TARGET_HAS_minmax_vec 0 +#define TCG_TARGET_HAS_minmax_vec 1 #define TCG_TARGET_HAS_bitsel_vec 0 #define TCG_TARGET_HAS_cmpsel_vec 0 -- 2.42.0
[PATCH v4 05/16] tcg/loongarch64: Lower add/sub_vec to vadd/vsub
Lower the following ops: - add_vec - sub_vec Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target-con-str.h | 1 + tcg/loongarch64/tcg-target.c.inc | 61 3 files changed, 63 insertions(+) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 8c8ea5d919..2d5dce75c3 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -32,4 +32,5 @@ C_O1_I2(r, rZ, ri) C_O1_I2(r, rZ, rJ) C_O1_I2(r, rZ, rZ) C_O1_I2(w, w, wM) +C_O1_I2(w, w, wA) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target-con-str.h b/tcg/loongarch64/tcg-target-con-str.h index a8a1c44014..2ba9c135ac 100644 --- a/tcg/loongarch64/tcg-target-con-str.h +++ b/tcg/loongarch64/tcg-target-con-str.h @@ -27,3 +27,4 @@ CONST('Z', TCG_CT_CONST_ZERO) CONST('C', TCG_CT_CONST_C12) CONST('W', TCG_CT_CONST_WSZ) CONST('M', TCG_CT_CONST_VCMP) +CONST('A', TCG_CT_CONST_VADD) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 129dd92910..1a369b237c 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -177,6 +177,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot) #define TCG_CT_CONST_C12 0x1000 #define TCG_CT_CONST_WSZ 0x2000 #define TCG_CT_CONST_VCMP 0x4000 +#define TCG_CT_CONST_VADD 0x8000 #define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 32) #define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32) @@ -214,6 +215,9 @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) if ((ct & TCG_CT_CONST_VCMP) && -0x10 <= vec_val && vec_val <= 0x1f) { return true; } +if ((ct & TCG_CT_CONST_VADD) && -0x1f <= vec_val && vec_val <= 0x1f) { +return true; +} return false; } @@ -1621,6 +1625,51 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece, } } +static void tcg_out_addsub_vec(TCGContext *s, unsigned vece, const TCGArg a0, + const TCGArg a1, const TCGArg a2, + bool a2_is_const, bool is_add) +{ +static const LoongArchInsn add_vec_insn[4] = { +OPC_VADD_B, OPC_VADD_H, OPC_VADD_W, OPC_VADD_D +}; +static const LoongArchInsn add_vec_imm_insn[4] = { +OPC_VADDI_BU, OPC_VADDI_HU, OPC_VADDI_WU, OPC_VADDI_DU +}; +static const LoongArchInsn sub_vec_insn[4] = { +OPC_VSUB_B, OPC_VSUB_H, OPC_VSUB_W, OPC_VSUB_D +}; +static const LoongArchInsn sub_vec_imm_insn[4] = { +OPC_VSUBI_BU, OPC_VSUBI_HU, OPC_VSUBI_WU, OPC_VSUBI_DU +}; + +if (a2_is_const) { +int64_t value = sextract64(a2, 0, 8 << vece); +if (!is_add) { +value = -value; +} + +/* Try vaddi/vsubi */ +if (0 <= value && value <= 0x1f) { +tcg_out32(s, encode_vdvjuk5_insn(add_vec_imm_insn[vece], a0, \ + a1, value)); +return; +} else if (-0x1f <= value && value < 0) { +tcg_out32(s, encode_vdvjuk5_insn(sub_vec_imm_insn[vece], a0, \ + a1, -value)); +return; +} + +/* constraint TCG_CT_CONST_VADD ensures unreachable */ +g_assert_not_reached(); +} + +if (is_add) { +tcg_out32(s, encode_vdvjvk_insn(add_vec_insn[vece], a0, a1, a2)); +} else { +tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2)); +} +} + static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl, unsigned vece, const TCGArg args[TCG_MAX_OP_ARGS], @@ -1712,6 +1761,12 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, } tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2)); break; +case INDEX_op_add_vec: +tcg_out_addsub_vec(s, vece, a0, a1, a2, const_args[2], true); +break; +case INDEX_op_sub_vec: +tcg_out_addsub_vec(s, vece, a0, a1, a2, const_args[2], false); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1728,6 +1783,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_dup_vec: case INDEX_op_dupm_vec: case INDEX_op_cmp_vec: +case INDEX_op_add_vec: +case INDEX_op_sub_vec: return 1; default: return 0; @@ -1892,6 +1949,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_cmp_vec: return C_O1_I2(w, w, wM); +case INDEX_op_add_vec: +case INDEX_op_sub_vec: +return C_O1_I2(w, w, wA); + default: g_assert_not_reached(); } -- 2.42.0
[PATCH v4 07/16] tcg/loongarch64: Lower neg_vec to vneg
Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 8 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index d569e443dd..b36b706e39 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1695,6 +1695,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, [TCG_COND_LTU] = {OPC_VSLTI_BU, OPC_VSLTI_HU, OPC_VSLTI_WU, OPC_VSLTI_DU}, }; LoongArchInsn insn; +static const LoongArchInsn neg_vec_insn[4] = { +OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D +}; a0 = args[0]; a1 = args[1]; @@ -1793,6 +1796,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_sub_vec: tcg_out_addsub_vec(s, vece, a0, a1, a2, const_args[2], false); break; +case INDEX_op_neg_vec: +tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1818,6 +1824,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_xor_vec: case INDEX_op_nor_vec: case INDEX_op_not_vec: +case INDEX_op_neg_vec: return 1; default: return 0; @@ -1995,6 +2002,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) return C_O1_I2(w, w, w); case INDEX_op_not_vec: +case INDEX_op_neg_vec: return C_O1_I1(w, w); default: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index f9c5cb12ca..64c72d0857 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -178,7 +178,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_v256 0 #define TCG_TARGET_HAS_not_vec 1 -#define TCG_TARGET_HAS_neg_vec 0 +#define TCG_TARGET_HAS_neg_vec 1 #define TCG_TARGET_HAS_abs_vec 0 #define TCG_TARGET_HAS_andc_vec 1 #define TCG_TARGET_HAS_orc_vec 1 -- 2.42.0
Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store
On 2023/9/3 09:06, Richard Henderson wrote: On 9/1/23 22:02, Jiajie Chen wrote: If LSX is available, use LSX instructions to implement 128-bit load & store. Is this really guaranteed to be an atomic 128-bit operation? Song Gao, please check this. Or, as for many vector processors, is this really two separate 64-bit memory operations under the hood? +static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg data_hi, + TCGReg addr_reg, MemOpIdx oi, bool is_ld) +{ + TCGLabelQemuLdst *ldst; + HostAddress h; + + ldst = prepare_host_addr(s, , addr_reg, oi, true); + if (is_ld) { + tcg_out_opc_vldx(s, TCG_VEC_TMP0, h.base, h.index); + tcg_out_opc_vpickve2gr_d(s, data_lo, TCG_VEC_TMP0, 0); + tcg_out_opc_vpickve2gr_d(s, data_hi, TCG_VEC_TMP0, 1); + } else { + tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_lo, 0); + tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_hi, 1); + tcg_out_opc_vstx(s, TCG_VEC_TMP0, h.base, h.index); + } You should use h.aa.atom < MO_128 to determine if 128-bit atomicity, and therefore the vector operation, is required. I assume the gr<->vr moves have a cost and two integer operations are preferred when allowable. Compare the other implementations of this function. r~
[PATCH v3 08/16] tcg/loongarch64: Lower mul_vec to vmul
Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 8 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 1e196bb68f..6905775698 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1665,6 +1665,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn neg_vec_insn[4] = { OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D }; +static const LoongArchInsn mul_vec_insn[4] = { +OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D +}; a0 = args[0]; a1 = args[1]; @@ -1798,6 +1801,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_neg_vec: tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1)); break; +case INDEX_op_mul_vec: +tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1824,6 +1830,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_nor_vec: case INDEX_op_not_vec: case INDEX_op_neg_vec: +case INDEX_op_mul_vec: return 1; default: return 0; @@ -1998,6 +2005,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_orc_vec: case INDEX_op_xor_vec: case INDEX_op_nor_vec: +case INDEX_op_mul_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 64c72d0857..2c2266ed31 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -185,7 +185,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_nand_vec 0 #define TCG_TARGET_HAS_nor_vec 1 #define TCG_TARGET_HAS_eqv_vec 0 -#define TCG_TARGET_HAS_mul_vec 0 +#define TCG_TARGET_HAS_mul_vec 1 #define TCG_TARGET_HAS_shi_vec 0 #define TCG_TARGET_HAS_shs_vec 0 #define TCG_TARGET_HAS_shv_vec 0 -- 2.42.0
[PATCH v3 10/16] tcg/loongarch64: Lower vector saturated ops
Lower the following ops: - ssadd_vec - usadd_vec - sssub_vec - ussub_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 32 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 3ffc1691cd..89db41002c 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1680,6 +1680,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn umax_vec_insn[4] = { OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU }; +static const LoongArchInsn ssadd_vec_insn[4] = { +OPC_VSADD_B, OPC_VSADD_H, OPC_VSADD_W, OPC_VSADD_D +}; +static const LoongArchInsn usadd_vec_insn[4] = { +OPC_VSADD_BU, OPC_VSADD_HU, OPC_VSADD_WU, OPC_VSADD_DU +}; +static const LoongArchInsn sssub_vec_insn[4] = { +OPC_VSSUB_B, OPC_VSSUB_H, OPC_VSSUB_W, OPC_VSSUB_D +}; +static const LoongArchInsn ussub_vec_insn[4] = { +OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU +}; a0 = args[0]; a1 = args[1]; @@ -1828,6 +1840,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_umax_vec: tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_ssadd_vec: +tcg_out32(s, encode_vdvjvk_insn(ssadd_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_usadd_vec: +tcg_out32(s, encode_vdvjvk_insn(usadd_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sssub_vec: +tcg_out32(s, encode_vdvjvk_insn(sssub_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_ussub_vec: +tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1859,6 +1883,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_smax_vec: case INDEX_op_umin_vec: case INDEX_op_umax_vec: +case INDEX_op_ssadd_vec: +case INDEX_op_usadd_vec: +case INDEX_op_sssub_vec: +case INDEX_op_ussub_vec: return 1; default: return 0; @@ -2038,6 +2066,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_smax_vec: case INDEX_op_umin_vec: case INDEX_op_umax_vec: +case INDEX_op_ssadd_vec: +case INDEX_op_usadd_vec: +case INDEX_op_sssub_vec: +case INDEX_op_ussub_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index ec725aaeaa..fa14558275 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -192,7 +192,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 0 -#define TCG_TARGET_HAS_sat_vec 0 +#define TCG_TARGET_HAS_sat_vec 1 #define TCG_TARGET_HAS_minmax_vec 1 #define TCG_TARGET_HAS_bitsel_vec 0 #define TCG_TARGET_HAS_cmpsel_vec 0 -- 2.42.0
[PATCH v3 05/16] tcg/loongarch64: Lower add/sub_vec to vadd/vsub
Lower the following ops: - add_vec - sub_vec Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target-con-str.h | 1 + tcg/loongarch64/tcg-target.c.inc | 60 3 files changed, 62 insertions(+) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 8c8ea5d919..2d5dce75c3 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -32,4 +32,5 @@ C_O1_I2(r, rZ, ri) C_O1_I2(r, rZ, rJ) C_O1_I2(r, rZ, rZ) C_O1_I2(w, w, wM) +C_O1_I2(w, w, wA) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target-con-str.h b/tcg/loongarch64/tcg-target-con-str.h index a8a1c44014..2ba9c135ac 100644 --- a/tcg/loongarch64/tcg-target-con-str.h +++ b/tcg/loongarch64/tcg-target-con-str.h @@ -27,3 +27,4 @@ CONST('Z', TCG_CT_CONST_ZERO) CONST('C', TCG_CT_CONST_C12) CONST('W', TCG_CT_CONST_WSZ) CONST('M', TCG_CT_CONST_VCMP) +CONST('A', TCG_CT_CONST_VADD) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 129dd92910..0edcf5be35 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -177,6 +177,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot) #define TCG_CT_CONST_C12 0x1000 #define TCG_CT_CONST_WSZ 0x2000 #define TCG_CT_CONST_VCMP 0x4000 +#define TCG_CT_CONST_VADD 0x8000 #define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 32) #define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32) @@ -214,6 +215,9 @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) if ((ct & TCG_CT_CONST_VCMP) && -0x10 <= vec_val && vec_val <= 0x1f) { return true; } +if ((ct & TCG_CT_CONST_VADD) && -0x1f <= vec_val && vec_val <= 0x1f) { +return true; +} return false; } @@ -1646,6 +1650,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, [TCG_COND_LTU] = {OPC_VSLTI_BU, OPC_VSLTI_HU, OPC_VSLTI_WU, OPC_VSLTI_DU}, }; LoongArchInsn insn; +static const LoongArchInsn add_vec_insn[4] = { +OPC_VADD_B, OPC_VADD_H, OPC_VADD_W, OPC_VADD_D +}; +static const LoongArchInsn add_vec_imm_insn[4] = { +OPC_VADDI_BU, OPC_VADDI_HU, OPC_VADDI_WU, OPC_VADDI_DU +}; +static const LoongArchInsn sub_vec_insn[4] = { +OPC_VSUB_B, OPC_VSUB_H, OPC_VSUB_W, OPC_VSUB_D +}; +static const LoongArchInsn sub_vec_imm_insn[4] = { +OPC_VSUBI_BU, OPC_VSUBI_HU, OPC_VSUBI_WU, OPC_VSUBI_DU +}; a0 = args[0]; a1 = args[1]; @@ -1712,6 +1728,44 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, } tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2)); break; +case INDEX_op_add_vec: +if (const_args[2]) { +int64_t value = sextract64(a2, 0, 8 << vece); +/* Try vaddi/vsubi */ +if (0 <= value && value <= 0x1f) { +tcg_out32(s, encode_vdvjuk5_insn(add_vec_imm_insn[vece], a0, \ + a1, value)); +break; +} else if (-0x1f <= value && value < 0) { +tcg_out32(s, encode_vdvjuk5_insn(sub_vec_imm_insn[vece], a0, \ + a1, -value)); +break; +} + +/* constraint TCG_CT_CONST_VADD ensures unreachable */ +g_assert_not_reached(); +} +tcg_out32(s, encode_vdvjvk_insn(add_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sub_vec: +if (const_args[2]) { +int64_t value = sextract64(a2, 0, 8 << vece); +/* Try vaddi/vsubi */ +if (0 <= value && value <= 0x1f) { +tcg_out32(s, encode_vdvjuk5_insn(sub_vec_imm_insn[vece], a0, \ + a1, value)); +break; +} else if (-0x1f <= value && value < 0) { +tcg_out32(s, encode_vdvjuk5_insn(add_vec_imm_insn[vece], a0, \ + a1, -value)); +break; +} + +/* constraint TCG_CT_CONST_VADD ensures unreachable */ +g_assert_not_reached(); +} +tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1728,6 +1782,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_dup_vec: case INDEX_op_dupm_vec: case INDEX_op_cmp_vec: +case INDEX_op_add_vec: +case INDEX_op_sub_vec: return 1; default: return 0; @@ -1892,6 +1948,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
[PATCH v3 07/16] tcg/loongarch64: Lower neg_vec to vneg
Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 8 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 133b0f7113..1e196bb68f 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1662,6 +1662,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn sub_vec_imm_insn[4] = { OPC_VSUBI_BU, OPC_VSUBI_HU, OPC_VSUBI_WU, OPC_VSUBI_DU }; +static const LoongArchInsn neg_vec_insn[4] = { +OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D +}; a0 = args[0]; a1 = args[1]; @@ -1792,6 +1795,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, } tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_neg_vec: +tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1817,6 +1823,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_xor_vec: case INDEX_op_nor_vec: case INDEX_op_not_vec: +case INDEX_op_neg_vec: return 1; default: return 0; @@ -1994,6 +2001,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) return C_O1_I2(w, w, w); case INDEX_op_not_vec: +case INDEX_op_neg_vec: return C_O1_I1(w, w); default: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index f9c5cb12ca..64c72d0857 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -178,7 +178,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_v256 0 #define TCG_TARGET_HAS_not_vec 1 -#define TCG_TARGET_HAS_neg_vec 0 +#define TCG_TARGET_HAS_neg_vec 1 #define TCG_TARGET_HAS_abs_vec 0 #define TCG_TARGET_HAS_andc_vec 1 #define TCG_TARGET_HAS_orc_vec 1 -- 2.42.0
[PATCH v3 14/16] tcg/loongarch64: Lower rotv_vec ops to LSX
Lower the following ops: - rotrv_vec - rotlv_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 14 ++ tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 8ac008b907..95359b1757 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1710,6 +1710,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn sari_vec_insn[4] = { OPC_VSRAI_B, OPC_VSRAI_H, OPC_VSRAI_W, OPC_VSRAI_D }; +static const LoongArchInsn rotrv_vec_insn[4] = { +OPC_VROTR_B, OPC_VROTR_H, OPC_VROTR_W, OPC_VROTR_D +}; a0 = args[0]; a1 = args[1]; @@ -1889,6 +1892,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_sari_vec: tcg_out32(s, encode_vdvjuk3_insn(sari_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_rotrv_vec: +tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_rotlv_vec: +/* rotlv_vec a1, a2 = rotrv_vec a1, -a2 */ +tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], temp_vec, a2)); +tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1, +temp_vec)); +break; case INDEX_op_bitsel_vec: /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */ tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1); @@ -2118,6 +2130,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_shlv_vec: case INDEX_op_shrv_vec: case INDEX_op_sarv_vec: +case INDEX_op_rotrv_vec: +case INDEX_op_rotlv_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index d7b806e252..d5c69bc192 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -191,7 +191,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_shv_vec 1 #define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 0 -#define TCG_TARGET_HAS_rotv_vec 0 +#define TCG_TARGET_HAS_rotv_vec 1 #define TCG_TARGET_HAS_sat_vec 1 #define TCG_TARGET_HAS_minmax_vec 1 #define TCG_TARGET_HAS_bitsel_vec 1 -- 2.42.0
[PATCH v3 02/16] tcg/loongarch64: Lower basic tcg vec ops to LSX
LSX support on host cpu is detected via hwcap. Lower the following ops to LSX: - dup_vec - dupi_vec - dupm_vec - ld_vec - st_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target-con-set.h | 2 + tcg/loongarch64/tcg-target-con-str.h | 1 + tcg/loongarch64/tcg-target.c.inc | 219 ++- tcg/loongarch64/tcg-target.h | 38 - tcg/loongarch64/tcg-target.opc.h | 12 ++ 5 files changed, 270 insertions(+), 2 deletions(-) create mode 100644 tcg/loongarch64/tcg-target.opc.h diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index c2bde44613..37b3f80bf9 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -17,7 +17,9 @@ C_O0_I1(r) C_O0_I2(rZ, r) C_O0_I2(rZ, rZ) +C_O0_I2(w, r) C_O1_I1(r, r) +C_O1_I1(w, r) C_O1_I2(r, r, rC) C_O1_I2(r, r, ri) C_O1_I2(r, r, rI) diff --git a/tcg/loongarch64/tcg-target-con-str.h b/tcg/loongarch64/tcg-target-con-str.h index 6e9ccca3ad..81b8d40278 100644 --- a/tcg/loongarch64/tcg-target-con-str.h +++ b/tcg/loongarch64/tcg-target-con-str.h @@ -14,6 +14,7 @@ * REGS(letter, register_mask) */ REGS('r', ALL_GENERAL_REGS) +REGS('w', ALL_VECTOR_REGS) /* * Define constraint letters for constants: diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index baf5fc3819..150278e112 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -32,6 +32,8 @@ #include "../tcg-ldst.c.inc" #include +bool use_lsx_instructions; + #ifdef CONFIG_DEBUG_TCG static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { "zero", @@ -65,7 +67,39 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { "s5", "s6", "s7", -"s8" +"s8", +"vr0", +"vr1", +"vr2", +"vr3", +"vr4", +"vr5", +"vr6", +"vr7", +"vr8", +"vr9", +"vr10", +"vr11", +"vr12", +"vr13", +"vr14", +"vr15", +"vr16", +"vr17", +"vr18", +"vr19", +"vr20", +"vr21", +"vr22", +"vr23", +"vr24", +"vr25", +"vr26", +"vr27", +"vr28", +"vr29", +"vr30", +"vr31", }; #endif @@ -102,6 +136,15 @@ static const int tcg_target_reg_alloc_order[] = { TCG_REG_A2, TCG_REG_A1, TCG_REG_A0, + +/* Vector registers */ +TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3, +TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7, +TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11, +TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15, +TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, +TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, +/* V24 - V31 are caller-saved, and skipped. */ }; static const int tcg_target_call_iarg_regs[] = { @@ -135,6 +178,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot) #define TCG_CT_CONST_WSZ 0x2000 #define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 32) +#define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32) static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len) { @@ -1486,6 +1530,154 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, } } +static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece, +TCGReg rd, TCGReg rs) +{ +switch (vece) { +case MO_8: +tcg_out_opc_vreplgr2vr_b(s, rd, rs); +break; +case MO_16: +tcg_out_opc_vreplgr2vr_h(s, rd, rs); +break; +case MO_32: +tcg_out_opc_vreplgr2vr_w(s, rd, rs); +break; +case MO_64: +tcg_out_opc_vreplgr2vr_d(s, rd, rs); +break; +default: +g_assert_not_reached(); +} +return true; +} + +static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece, + TCGReg r, TCGReg base, intptr_t offset) +{ +/* Handle imm overflow and division (vldrepl.d imm is divided by 8) */ +if (offset < -0x800 || offset > 0x7ff || \ +(offset & ((1 << vece) - 1)) != 0) { +tcg_out_addi(s, TCG_TYPE_I64, TCG_REG_TMP0, base, offset); +base = TCG_REG_TMP0; +offset = 0; +} +offset >>= vece; + +switch (vece) { +case MO_8: +tcg_out_opc_vldrepl_b(s, r, base, offset); +break; +case MO_16: +tcg_out_opc_vldrepl_h(s, r, base, offset); +break; +case MO_32: +tcg_out_opc_vldrepl_w(s, r, base, offset); +break; +case
[PATCH v3 06/16] tcg/loongarch64: Lower vector bitwise operations
Lower the following ops: - and_vec - andc_vec - or_vec - orc_vec - xor_vec - nor_vec - not_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target-con-set.h | 2 ++ tcg/loongarch64/tcg-target.c.inc | 44 tcg/loongarch64/tcg-target.h | 8 ++--- 3 files changed, 50 insertions(+), 4 deletions(-) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 2d5dce75c3..3f530ad4d8 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -20,6 +20,7 @@ C_O0_I2(rZ, rZ) C_O0_I2(w, r) C_O1_I1(r, r) C_O1_I1(w, r) +C_O1_I1(w, w) C_O1_I2(r, r, rC) C_O1_I2(r, r, ri) C_O1_I2(r, r, rI) @@ -31,6 +32,7 @@ C_O1_I2(r, 0, rZ) C_O1_I2(r, rZ, ri) C_O1_I2(r, rZ, rJ) C_O1_I2(r, rZ, rZ) +C_O1_I2(w, w, w) C_O1_I2(w, w, wM) C_O1_I2(w, w, wA) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 0edcf5be35..133b0f7113 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1689,6 +1689,32 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, tcg_out_opc_vldx(s, a0, a1, temp); } break; +case INDEX_op_and_vec: +tcg_out_opc_vand_v(s, a0, a1, a2); +break; +case INDEX_op_andc_vec: +/* + * vandn vd, vj, vk: vd = vk & ~vj + * andc_vec vd, vj, vk: vd = vj & ~vk + * vk and vk are swapped + */ +tcg_out_opc_vandn_v(s, a0, a2, a1); +break; +case INDEX_op_or_vec: +tcg_out_opc_vor_v(s, a0, a1, a2); +break; +case INDEX_op_orc_vec: +tcg_out_opc_vorn_v(s, a0, a1, a2); +break; +case INDEX_op_xor_vec: +tcg_out_opc_vxor_v(s, a0, a1, a2); +break; +case INDEX_op_nor_vec: +tcg_out_opc_vnor_v(s, a0, a1, a2); +break; +case INDEX_op_not_vec: +tcg_out_opc_vnor_v(s, a0, a1, a1); +break; case INDEX_op_cmp_vec: TCGCond cond = args[3]; if (const_args[2]) { @@ -1784,6 +1810,13 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_cmp_vec: case INDEX_op_add_vec: case INDEX_op_sub_vec: +case INDEX_op_and_vec: +case INDEX_op_andc_vec: +case INDEX_op_or_vec: +case INDEX_op_orc_vec: +case INDEX_op_xor_vec: +case INDEX_op_nor_vec: +case INDEX_op_not_vec: return 1; default: return 0; @@ -1952,6 +1985,17 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_sub_vec: return C_O1_I2(w, w, wA); +case INDEX_op_and_vec: +case INDEX_op_andc_vec: +case INDEX_op_or_vec: +case INDEX_op_orc_vec: +case INDEX_op_xor_vec: +case INDEX_op_nor_vec: +return C_O1_I2(w, w, w); + +case INDEX_op_not_vec: +return C_O1_I1(w, w); + default: g_assert_not_reached(); } diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index daaf38ee31..f9c5cb12ca 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -177,13 +177,13 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_v128 use_lsx_instructions #define TCG_TARGET_HAS_v256 0 -#define TCG_TARGET_HAS_not_vec 0 +#define TCG_TARGET_HAS_not_vec 1 #define TCG_TARGET_HAS_neg_vec 0 #define TCG_TARGET_HAS_abs_vec 0 -#define TCG_TARGET_HAS_andc_vec 0 -#define TCG_TARGET_HAS_orc_vec 0 +#define TCG_TARGET_HAS_andc_vec 1 +#define TCG_TARGET_HAS_orc_vec 1 #define TCG_TARGET_HAS_nand_vec 0 -#define TCG_TARGET_HAS_nor_vec 0 +#define TCG_TARGET_HAS_nor_vec 1 #define TCG_TARGET_HAS_eqv_vec 0 #define TCG_TARGET_HAS_mul_vec 0 #define TCG_TARGET_HAS_shi_vec 0 -- 2.42.0
[PATCH v3 13/16] tcg/loongarch64: Lower vector shift integer ops
Lower the following ops: - shli_vec - shrv_vec - sarv_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 21 + tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 2db4369a9e..8ac008b907 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1701,6 +1701,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn sarv_vec_insn[4] = { OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D }; +static const LoongArchInsn shli_vec_insn[4] = { +OPC_VSLLI_B, OPC_VSLLI_H, OPC_VSLLI_W, OPC_VSLLI_D +}; +static const LoongArchInsn shri_vec_insn[4] = { +OPC_VSRLI_B, OPC_VSRLI_H, OPC_VSRLI_W, OPC_VSRLI_D +}; +static const LoongArchInsn sari_vec_insn[4] = { +OPC_VSRAI_B, OPC_VSRAI_H, OPC_VSRAI_W, OPC_VSRAI_D +}; a0 = args[0]; a1 = args[1]; @@ -1871,6 +1880,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_sarv_vec: tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_shli_vec: +tcg_out32(s, encode_vdvjuk3_insn(shli_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_shri_vec: +tcg_out32(s, encode_vdvjuk3_insn(shri_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sari_vec: +tcg_out32(s, encode_vdvjuk3_insn(sari_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_bitsel_vec: /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */ tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1); @@ -2104,6 +2122,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_not_vec: case INDEX_op_neg_vec: +case INDEX_op_shli_vec: +case INDEX_op_shri_vec: +case INDEX_op_sari_vec: return C_O1_I1(w, w); case INDEX_op_bitsel_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index bc56939a57..d7b806e252 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -186,7 +186,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_nor_vec 1 #define TCG_TARGET_HAS_eqv_vec 0 #define TCG_TARGET_HAS_mul_vec 1 -#define TCG_TARGET_HAS_shi_vec 0 +#define TCG_TARGET_HAS_shi_vec 1 #define TCG_TARGET_HAS_shs_vec 0 #define TCG_TARGET_HAS_shv_vec 1 #define TCG_TARGET_HAS_roti_vec 0 -- 2.42.0
[PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store
If LSX is available, use LSX instructions to implement 128-bit load & store. Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 2 ++ tcg/loongarch64/tcg-target.c.inc | 42 tcg/loongarch64/tcg-target.h | 2 +- 3 files changed, 45 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 914572d21b..77d62e38e7 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -18,6 +18,7 @@ C_O0_I1(r) C_O0_I2(rZ, r) C_O0_I2(rZ, rZ) C_O0_I2(w, r) +C_O0_I3(r, r, r) C_O1_I1(r, r) C_O1_I1(w, r) C_O1_I1(w, w) @@ -37,3 +38,4 @@ C_O1_I2(w, w, wM) C_O1_I2(w, w, wA) C_O1_I3(w, w, w, w) C_O1_I4(r, rZ, rJ, rZ, rZ) +C_O2_I1(r, r, r) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 2b001598e2..9d999ef58c 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1081,6 +1081,31 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg, } } +static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg data_hi, + TCGReg addr_reg, MemOpIdx oi, bool is_ld) +{ +TCGLabelQemuLdst *ldst; +HostAddress h; + +ldst = prepare_host_addr(s, , addr_reg, oi, true); +if (is_ld) { +tcg_out_opc_vldx(s, TCG_VEC_TMP0, h.base, h.index); +tcg_out_opc_vpickve2gr_d(s, data_lo, TCG_VEC_TMP0, 0); +tcg_out_opc_vpickve2gr_d(s, data_hi, TCG_VEC_TMP0, 1); +} else { +tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_lo, 0); +tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_hi, 1); +tcg_out_opc_vstx(s, TCG_VEC_TMP0, h.base, h.index); +} + +if (ldst) { +ldst->type = TCG_TYPE_I128; +ldst->datalo_reg = data_lo; +ldst->datahi_reg = data_hi; +ldst->raddr = tcg_splitwx_to_rx(s->code_ptr); +} +} + /* * Entry-points */ @@ -1145,6 +1170,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, TCGArg a0 = args[0]; TCGArg a1 = args[1]; TCGArg a2 = args[2]; +TCGArg a3 = args[3]; int c2 = const_args[2]; switch (opc) { @@ -1507,6 +1533,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, case INDEX_op_qemu_ld_a64_i64: tcg_out_qemu_ld(s, a0, a1, a2, TCG_TYPE_I64); break; +case INDEX_op_qemu_ld_a32_i128: +case INDEX_op_qemu_ld_a64_i128: +tcg_out_qemu_ldst_i128(s, a0, a1, a2, a3, true); +break; case INDEX_op_qemu_st_a32_i32: case INDEX_op_qemu_st_a64_i32: tcg_out_qemu_st(s, a0, a1, a2, TCG_TYPE_I32); @@ -1515,6 +1545,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, case INDEX_op_qemu_st_a64_i64: tcg_out_qemu_st(s, a0, a1, a2, TCG_TYPE_I64); break; +case INDEX_op_qemu_st_a32_i128: +case INDEX_op_qemu_st_a64_i128: +tcg_out_qemu_ldst_i128(s, a0, a1, a2, a3, false); +break; case INDEX_op_mov_i32: /* Always emitted via tcg_out_mov. */ case INDEX_op_mov_i64: @@ -1995,6 +2029,14 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_qemu_st_a64_i64: return C_O0_I2(rZ, r); +case INDEX_op_qemu_ld_a32_i128: +case INDEX_op_qemu_ld_a64_i128: +return C_O2_I1(r, r, r); + +case INDEX_op_qemu_st_a32_i128: +case INDEX_op_qemu_st_a64_i128: +return C_O0_I3(r, r, r); + case INDEX_op_brcond_i32: case INDEX_op_brcond_i64: return C_O0_I2(rZ, rZ); diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 67b0a95532..03017672f6 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -171,7 +171,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_muluh_i641 #define TCG_TARGET_HAS_mulsh_i641 -#define TCG_TARGET_HAS_qemu_ldst_i128 0 +#define TCG_TARGET_HAS_qemu_ldst_i128 use_lsx_instructions #define TCG_TARGET_HAS_v64 0 #define TCG_TARGET_HAS_v128 use_lsx_instructions -- 2.42.0
[PATCH v3 12/16] tcg/loongarch64: Lower bitsel_vec to vbitsel
Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target.c.inc | 11 ++- tcg/loongarch64/tcg-target.h | 2 +- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 3f530ad4d8..914572d21b 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -35,4 +35,5 @@ C_O1_I2(r, rZ, rZ) C_O1_I2(w, w, w) C_O1_I2(w, w, wM) C_O1_I2(w, w, wA) +C_O1_I3(w, w, w, w) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index ef1cd7c621..2db4369a9e 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1631,7 +1631,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, const int const_args[TCG_MAX_OP_ARGS]) { TCGType type = vecl + TCG_TYPE_V64; -TCGArg a0, a1, a2; +TCGArg a0, a1, a2, a3; TCGReg temp = TCG_REG_TMP0; TCGReg temp_vec = TCG_VEC_TMP0; @@ -1705,6 +1705,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, a0 = args[0]; a1 = args[1]; a2 = args[2]; +a3 = args[3]; /* Currently only supports V128 */ tcg_debug_assert(type == TCG_TYPE_V128); @@ -1870,6 +1871,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_sarv_vec: tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_bitsel_vec: +/* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */ +tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1908,6 +1913,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_shlv_vec: case INDEX_op_shrv_vec: case INDEX_op_sarv_vec: +case INDEX_op_bitsel_vec: return 1; default: return 0; @@ -2100,6 +2106,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_neg_vec: return C_O1_I1(w, w); +case INDEX_op_bitsel_vec: +return C_O1_I3(w, w, w, w); + default: g_assert_not_reached(); } diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 7e9fb61c47..bc56939a57 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -194,7 +194,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_rotv_vec 0 #define TCG_TARGET_HAS_sat_vec 1 #define TCG_TARGET_HAS_minmax_vec 1 -#define TCG_TARGET_HAS_bitsel_vec 0 +#define TCG_TARGET_HAS_bitsel_vec 1 #define TCG_TARGET_HAS_cmpsel_vec 0 #define TCG_TARGET_DEFAULT_MO (0) -- 2.42.0
[PATCH v3 11/16] tcg/loongarch64: Lower vector shift vector ops
Lower the following ops: - shlv_vec - shrv_vec - sarv_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 24 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 89db41002c..ef1cd7c621 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1692,6 +1692,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn ussub_vec_insn[4] = { OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU }; +static const LoongArchInsn shlv_vec_insn[4] = { +OPC_VSLL_B, OPC_VSLL_H, OPC_VSLL_W, OPC_VSLL_D +}; +static const LoongArchInsn shrv_vec_insn[4] = { +OPC_VSRL_B, OPC_VSRL_H, OPC_VSRL_W, OPC_VSRL_D +}; +static const LoongArchInsn sarv_vec_insn[4] = { +OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D +}; a0 = args[0]; a1 = args[1]; @@ -1852,6 +1861,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_ussub_vec: tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_shlv_vec: +tcg_out32(s, encode_vdvjvk_insn(shlv_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_shrv_vec: +tcg_out32(s, encode_vdvjvk_insn(shrv_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sarv_vec: +tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1887,6 +1905,9 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_usadd_vec: case INDEX_op_sssub_vec: case INDEX_op_ussub_vec: +case INDEX_op_shlv_vec: +case INDEX_op_shrv_vec: +case INDEX_op_sarv_vec: return 1; default: return 0; @@ -2070,6 +2091,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_usadd_vec: case INDEX_op_sssub_vec: case INDEX_op_ussub_vec: +case INDEX_op_shlv_vec: +case INDEX_op_shrv_vec: +case INDEX_op_sarv_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index fa14558275..7e9fb61c47 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -188,7 +188,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_mul_vec 1 #define TCG_TARGET_HAS_shi_vec 0 #define TCG_TARGET_HAS_shs_vec 0 -#define TCG_TARGET_HAS_shv_vec 0 +#define TCG_TARGET_HAS_shv_vec 1 #define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 0 -- 2.42.0
[PATCH v3 04/16] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt
Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target-con-str.h | 1 + tcg/loongarch64/tcg-target.c.inc | 65 3 files changed, 67 insertions(+) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 37b3f80bf9..8c8ea5d919 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -31,4 +31,5 @@ C_O1_I2(r, 0, rZ) C_O1_I2(r, rZ, ri) C_O1_I2(r, rZ, rJ) C_O1_I2(r, rZ, rZ) +C_O1_I2(w, w, wM) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target-con-str.h b/tcg/loongarch64/tcg-target-con-str.h index 81b8d40278..a8a1c44014 100644 --- a/tcg/loongarch64/tcg-target-con-str.h +++ b/tcg/loongarch64/tcg-target-con-str.h @@ -26,3 +26,4 @@ CONST('U', TCG_CT_CONST_U12) CONST('Z', TCG_CT_CONST_ZERO) CONST('C', TCG_CT_CONST_C12) CONST('W', TCG_CT_CONST_WSZ) +CONST('M', TCG_CT_CONST_VCMP) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 07a0326e5d..129dd92910 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -176,6 +176,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot) #define TCG_CT_CONST_U12 0x800 #define TCG_CT_CONST_C12 0x1000 #define TCG_CT_CONST_WSZ 0x2000 +#define TCG_CT_CONST_VCMP 0x4000 #define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 32) #define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32) @@ -209,6 +210,10 @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) if ((ct & TCG_CT_CONST_WSZ) && val == (type == TCG_TYPE_I32 ? 32 : 64)) { return true; } +int64_t vec_val = sextract64(val, 0, 8 << vece); +if ((ct & TCG_CT_CONST_VCMP) && -0x10 <= vec_val && vec_val <= 0x1f) { +return true; +} return false; } @@ -1624,6 +1629,23 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, TCGType type = vecl + TCG_TYPE_V64; TCGArg a0, a1, a2; TCGReg temp = TCG_REG_TMP0; +TCGReg temp_vec = TCG_VEC_TMP0; + +static const LoongArchInsn cmp_vec_insn[16][4] = { +[TCG_COND_EQ] = {OPC_VSEQ_B, OPC_VSEQ_H, OPC_VSEQ_W, OPC_VSEQ_D}, +[TCG_COND_LE] = {OPC_VSLE_B, OPC_VSLE_H, OPC_VSLE_W, OPC_VSLE_D}, +[TCG_COND_LEU] = {OPC_VSLE_BU, OPC_VSLE_HU, OPC_VSLE_WU, OPC_VSLE_DU}, +[TCG_COND_LT] = {OPC_VSLT_B, OPC_VSLT_H, OPC_VSLT_W, OPC_VSLT_D}, +[TCG_COND_LTU] = {OPC_VSLT_BU, OPC_VSLT_HU, OPC_VSLT_WU, OPC_VSLT_DU}, +}; +static const LoongArchInsn cmp_vec_imm_insn[16][4] = { +[TCG_COND_EQ] = {OPC_VSEQI_B, OPC_VSEQI_H, OPC_VSEQI_W, OPC_VSEQI_D}, +[TCG_COND_LE] = {OPC_VSLEI_B, OPC_VSLEI_H, OPC_VSLEI_W, OPC_VSLEI_D}, +[TCG_COND_LEU] = {OPC_VSLEI_BU, OPC_VSLEI_HU, OPC_VSLEI_WU, OPC_VSLEI_DU}, +[TCG_COND_LT] = {OPC_VSLTI_B, OPC_VSLTI_H, OPC_VSLTI_W, OPC_VSLTI_D}, +[TCG_COND_LTU] = {OPC_VSLTI_BU, OPC_VSLTI_HU, OPC_VSLTI_WU, OPC_VSLTI_DU}, +}; +LoongArchInsn insn; a0 = args[0]; a1 = args[1]; @@ -1651,6 +1673,45 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, tcg_out_opc_vldx(s, a0, a1, temp); } break; +case INDEX_op_cmp_vec: +TCGCond cond = args[3]; +if (const_args[2]) { +/* + * cmp_vec dest, src, value + * Try vseqi/vslei/vslti + */ +int64_t value = sextract64(a2, 0, 8 << vece); +if ((cond == TCG_COND_EQ || cond == TCG_COND_LE || \ + cond == TCG_COND_LT) && (-0x10 <= value && value <= 0x0f)) { +tcg_out32(s, encode_vdvjsk5_insn(cmp_vec_imm_insn[cond][vece], \ + a0, a1, value)); +break; +} else if ((cond == TCG_COND_LEU || cond == TCG_COND_LTU) && +(0x00 <= value && value <= 0x1f)) { +tcg_out32(s, encode_vdvjuk5_insn(cmp_vec_imm_insn[cond][vece], \ + a0, a1, value)); +break; +} + +/* + * Fallback to: + * dupi_vec temp, a2 + * cmp_vec a0, a1, temp, cond + */ +tcg_out_dupi_vec(s, type, vece, temp_vec, a2); +a2 = temp_vec; +} + +insn = cmp_vec_insn[cond][vece]; +if (insn == 0) { +TCGArg t; +t = a1, a1 = a2, a2 = t; +cond = tcg_swap_cond(cond); +insn = cmp_vec_insn[cond][vece]; +tcg_debug_assert(insn != 0); +} +tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1666,6 +1727,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc,
[PATCH v3 09/16] tcg/loongarch64: Lower vector min max ops
Lower the following ops: - smin_vec - smax_vec - umin_vec - umax_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 32 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 6905775698..3ffc1691cd 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1668,6 +1668,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn mul_vec_insn[4] = { OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D }; +static const LoongArchInsn smin_vec_insn[4] = { +OPC_VMIN_B, OPC_VMIN_H, OPC_VMIN_W, OPC_VMIN_D +}; +static const LoongArchInsn umin_vec_insn[4] = { +OPC_VMIN_BU, OPC_VMIN_HU, OPC_VMIN_WU, OPC_VMIN_DU +}; +static const LoongArchInsn smax_vec_insn[4] = { +OPC_VMAX_B, OPC_VMAX_H, OPC_VMAX_W, OPC_VMAX_D +}; +static const LoongArchInsn umax_vec_insn[4] = { +OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU +}; a0 = args[0]; a1 = args[1]; @@ -1804,6 +1816,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_mul_vec: tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_smin_vec: +tcg_out32(s, encode_vdvjvk_insn(smin_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_smax_vec: +tcg_out32(s, encode_vdvjvk_insn(smax_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_umin_vec: +tcg_out32(s, encode_vdvjvk_insn(umin_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_umax_vec: +tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1831,6 +1855,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_not_vec: case INDEX_op_neg_vec: case INDEX_op_mul_vec: +case INDEX_op_smin_vec: +case INDEX_op_smax_vec: +case INDEX_op_umin_vec: +case INDEX_op_umax_vec: return 1; default: return 0; @@ -2006,6 +2034,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_xor_vec: case INDEX_op_nor_vec: case INDEX_op_mul_vec: +case INDEX_op_smin_vec: +case INDEX_op_smax_vec: +case INDEX_op_umin_vec: +case INDEX_op_umax_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 2c2266ed31..ec725aaeaa 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -193,7 +193,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 0 #define TCG_TARGET_HAS_sat_vec 0 -#define TCG_TARGET_HAS_minmax_vec 0 +#define TCG_TARGET_HAS_minmax_vec 1 #define TCG_TARGET_HAS_bitsel_vec 0 #define TCG_TARGET_HAS_cmpsel_vec 0 -- 2.42.0
[PATCH v3 15/16] tcg/loongarch64: Lower rotli_vec to vrotri
Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 21 + tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 95359b1757..2b001598e2 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1901,6 +1901,26 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1, temp_vec)); break; +case INDEX_op_rotli_vec: +/* rotli_vec a1, a2 = rotri_vec a1, -a2 */ +a2 = extract32(-a2, 0, 3 + vece); +switch (vece) { +case MO_8: +tcg_out_opc_vrotri_b(s, a0, a1, a2); +break; +case MO_16: +tcg_out_opc_vrotri_h(s, a0, a1, a2); +break; +case MO_32: +tcg_out_opc_vrotri_w(s, a0, a1, a2); +break; +case MO_64: +tcg_out_opc_vrotri_d(s, a0, a1, a2); +break; +default: +g_assert_not_reached(); +} +break; case INDEX_op_bitsel_vec: /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */ tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1); @@ -2139,6 +2159,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_shli_vec: case INDEX_op_shri_vec: case INDEX_op_sari_vec: +case INDEX_op_rotli_vec: return C_O1_I1(w, w); case INDEX_op_bitsel_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index d5c69bc192..67b0a95532 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -189,7 +189,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_shi_vec 1 #define TCG_TARGET_HAS_shs_vec 0 #define TCG_TARGET_HAS_shv_vec 1 -#define TCG_TARGET_HAS_roti_vec 0 +#define TCG_TARGET_HAS_roti_vec 1 #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 1 #define TCG_TARGET_HAS_sat_vec 1 -- 2.42.0
[PATCH v3 00/16] Lower TCG vector ops to LSX
This patch series allows qemu to utilize LSX instructions on LoongArch machines to execute TCG vector ops. Passed tcg tests with x86_64 and aarch64 cross compilers. Changes since v2: - Add vece argument to tcg_target_const_match() for const args of vector ops - Use custom constraint for cmp_vec/add_vec/sub_vec for better const arg handling - Implement 128-bit load & store using vldx/vstx Changes since v1: - Optimize dupi_vec/st_vec/ld_vec/cmp_vec/add_vec/sub_vec generation - Lower not_vec/shi_vec/roti_vec/rotv_vec Jiajie Chen (16): tcg/loongarch64: Import LSX instructions tcg/loongarch64: Lower basic tcg vec ops to LSX tcg: pass vece to tcg_target_const_match() tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt tcg/loongarch64: Lower add/sub_vec to vadd/vsub tcg/loongarch64: Lower vector bitwise operations tcg/loongarch64: Lower neg_vec to vneg tcg/loongarch64: Lower mul_vec to vmul tcg/loongarch64: Lower vector min max ops tcg/loongarch64: Lower vector saturated ops tcg/loongarch64: Lower vector shift vector ops tcg/loongarch64: Lower bitsel_vec to vbitsel tcg/loongarch64: Lower vector shift integer ops tcg/loongarch64: Lower rotv_vec ops to LSX tcg/loongarch64: Lower rotli_vec to vrotri tcg/loongarch64: Implement 128-bit load & store tcg/aarch64/tcg-target.c.inc |2 +- tcg/arm/tcg-target.c.inc |2 +- tcg/i386/tcg-target.c.inc|2 +- tcg/loongarch64/tcg-insn-defs.c.inc | 6251 +- tcg/loongarch64/tcg-target-con-set.h |9 + tcg/loongarch64/tcg-target-con-str.h |3 + tcg/loongarch64/tcg-target.c.inc | 601 ++- tcg/loongarch64/tcg-target.h | 40 +- tcg/loongarch64/tcg-target.opc.h | 12 + tcg/mips/tcg-target.c.inc|2 +- tcg/ppc/tcg-target.c.inc |2 +- tcg/riscv/tcg-target.c.inc |2 +- tcg/s390x/tcg-target.c.inc |2 +- tcg/sparc64/tcg-target.c.inc |2 +- tcg/tcg.c|4 +- tcg/tci/tcg-target.c.inc |2 +- 16 files changed, 6806 insertions(+), 132 deletions(-) create mode 100644 tcg/loongarch64/tcg-target.opc.h -- 2.42.0
[PATCH v3 03/16] tcg: pass vece to tcg_target_const_match()
Pass vece to tcg_target_const_match() to allow correct interpretation of const args of vector ops. Signed-off-by: Jiajie Chen --- tcg/aarch64/tcg-target.c.inc | 2 +- tcg/arm/tcg-target.c.inc | 2 +- tcg/i386/tcg-target.c.inc| 2 +- tcg/loongarch64/tcg-target.c.inc | 2 +- tcg/mips/tcg-target.c.inc| 2 +- tcg/ppc/tcg-target.c.inc | 2 +- tcg/riscv/tcg-target.c.inc | 2 +- tcg/s390x/tcg-target.c.inc | 2 +- tcg/sparc64/tcg-target.c.inc | 2 +- tcg/tcg.c| 4 ++-- tcg/tci/tcg-target.c.inc | 2 +- 11 files changed, 12 insertions(+), 12 deletions(-) diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc index 0931a69448..a1e2b6be16 100644 --- a/tcg/aarch64/tcg-target.c.inc +++ b/tcg/aarch64/tcg-target.c.inc @@ -272,7 +272,7 @@ static bool is_shimm1632(uint32_t v32, int *cmode, int *imm8) } } -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc index acb5f23b54..76f1345002 100644 --- a/tcg/arm/tcg-target.c.inc +++ b/tcg/arm/tcg-target.c.inc @@ -509,7 +509,7 @@ static bool is_shimm1632(uint32_t v32, int *cmode, int *imm8) * mov operand2: values represented with x << (2 * y), x < 0x100 * add, sub, eor...: ditto */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc index 0c3d1e4cef..aed91e515e 100644 --- a/tcg/i386/tcg-target.c.inc +++ b/tcg/i386/tcg-target.c.inc @@ -198,7 +198,7 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type, } /* test if a constant matches the constraint */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 150278e112..07a0326e5d 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -186,7 +186,7 @@ static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len) } /* test if a constant matches the constraint */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return true; diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc index 9faa8bdf0b..c6662889f0 100644 --- a/tcg/mips/tcg-target.c.inc +++ b/tcg/mips/tcg-target.c.inc @@ -190,7 +190,7 @@ static bool is_p2m1(tcg_target_long val) } /* test if a constant matches the constraint */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc index 090f11e71c..ccf245191d 100644 --- a/tcg/ppc/tcg-target.c.inc +++ b/tcg/ppc/tcg-target.c.inc @@ -261,7 +261,7 @@ static bool reloc_pc14(tcg_insn_unit *src_rw, const tcg_insn_unit *target) } /* test if a constant matches the constraint */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc index 9be81c1b7b..3bd7959e7e 100644 --- a/tcg/riscv/tcg-target.c.inc +++ b/tcg/riscv/tcg-target.c.inc @@ -145,7 +145,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot) #define sextreg sextract64 /* test if a constant matches the constraint */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc index ecd8aaf2a1..f4d3abcb71 100644 --- a/tcg/s390x/tcg-target.c.inc +++ b/tcg/s390x/tcg-target.c.inc @@ -540,7 +540,7 @@ static bool risbg_mask(uint64_t c) } /* Test if a constant matches the constraint. */ -static bool tcg_target_const_match(int64_t val, TCGType type, int ct) +static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) { if (ct & TCG_CT_CONST) { return 1; diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc index 81a08bb6c5..6b9be4c520 100644 --- a/tcg/sparc64/tcg-target.c.inc +++ b/tcg/sparc64/tcg-target.c.inc @@
Re: [PATCH v2 03/14] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt
On 2023/9/2 01:48, Richard Henderson wrote: On 9/1/23 10:28, Jiajie Chen wrote: On 2023/9/2 01:24, Richard Henderson wrote: On 9/1/23 02:30, Jiajie Chen wrote: Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target.c.inc | 60 2 files changed, 61 insertions(+) Reviewed-by: Richard Henderson diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 37b3f80bf9..d04916db25 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -31,4 +31,5 @@ C_O1_I2(r, 0, rZ) C_O1_I2(r, rZ, ri) C_O1_I2(r, rZ, rJ) C_O1_I2(r, rZ, rZ) +C_O1_I2(w, w, wJ) Notes for improvement: 'J' is a signed 32-bit immediate. I was wondering about the behavior of 'J' on i128 types: in tcg_target_const_match(), the argument type is int, so will the higher bits be truncated? The argument is int64_t val. The only constants that we allow for vectors are dupi, so all higher parts are the same as the lower part. Consider the following scenario: cmp_vec v128,e32,tmp4,tmp3,v128$0x cmp_vec v128,e32,tmp4,tmp3,v128$0xfffefffe cmp_vec v128,e8,tmp4,tmp3,v128$0xfefefefefefefefe When matching constant constraint, the vector element width is unknown, so it cannot decide whether 0xfefefefefefefefe means e8 0xfe or e16 0xfefe. Besides, tcg_target_const_match() does not know the vector element width. No, it hadn't been required so far -- there are very few vector instructions that allow immediates. r~
Re: [PATCH v2 03/14] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt
On 2023/9/2 01:24, Richard Henderson wrote: On 9/1/23 02:30, Jiajie Chen wrote: Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target.c.inc | 60 2 files changed, 61 insertions(+) Reviewed-by: Richard Henderson diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 37b3f80bf9..d04916db25 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -31,4 +31,5 @@ C_O1_I2(r, 0, rZ) C_O1_I2(r, rZ, ri) C_O1_I2(r, rZ, rJ) C_O1_I2(r, rZ, rZ) +C_O1_I2(w, w, wJ) Notes for improvement: 'J' is a signed 32-bit immediate. I was wondering about the behavior of 'J' on i128 types: in tcg_target_const_match(), the argument type is int, so will the higher bits be truncated? Besides, tcg_target_const_match() does not know the vector element width. + if (const_args[2]) { + /* + * cmp_vec dest, src, value + * Try vseqi/vslei/vslti + */ + int64_t value = sextract64(a2, 0, 8 << vece); + if ((cond == TCG_COND_EQ || cond == TCG_COND_LE || \ + cond == TCG_COND_LT) && (-0x10 <= value && value <= 0x0f)) { + tcg_out32(s, encode_vdvjsk5_insn(cmp_vec_imm_insn[cond][vece], \ + a0, a1, value)); + break; + } else if ((cond == TCG_COND_LEU || cond == TCG_COND_LTU) && + (0x00 <= value && value <= 0x1f)) { + tcg_out32(s, encode_vdvjuk5_insn(cmp_vec_imm_insn[cond][vece], \ + a0, a1, value)); Better would be a new constraint that only matches -0x10 <= x <= 0x1f If the sign is wrong for the comparison, it can *always* be loaded with just vldi. Whereas at present, using J, + tcg_out_dupi_vec(s, type, vece, temp_vec, a2); + a2 = temp_vec; this may require 3 instructions (lu12i.w + ori + vreplgr2vr). By constraining the constants allowed, you allow the register allocator to see that a register is required, which may be reused for another instruction. r~
[PATCH v2 05/14] tcg/loongarch64: Lower vector bitwise operations
Lower the following ops: - and_vec - andc_vec - or_vec - orc_vec - xor_vec - nor_vec - not_vec Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 2 ++ tcg/loongarch64/tcg-target.c.inc | 44 tcg/loongarch64/tcg-target.h | 8 ++--- 3 files changed, 50 insertions(+), 4 deletions(-) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index eaa015e813..13a7f3b5e2 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -20,6 +20,7 @@ C_O0_I2(rZ, rZ) C_O0_I2(w, r) C_O1_I1(r, r) C_O1_I1(w, r) +C_O1_I1(w, w) C_O1_I2(r, r, rC) C_O1_I2(r, r, ri) C_O1_I2(r, r, rI) @@ -31,6 +32,7 @@ C_O1_I2(r, 0, rZ) C_O1_I2(r, rZ, ri) C_O1_I2(r, rZ, rJ) C_O1_I2(r, rZ, rZ) +C_O1_I2(w, w, w) C_O1_I2(w, w, wi) C_O1_I2(w, w, wJ) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 555080f2b0..20e25dc490 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1680,6 +1680,32 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, tcg_out_opc_vldx(s, a0, a1, temp); } break; +case INDEX_op_and_vec: +tcg_out_opc_vand_v(s, a0, a1, a2); +break; +case INDEX_op_andc_vec: +/* + * vandn vd, vj, vk: vd = vk & ~vj + * andc_vec vd, vj, vk: vd = vj & ~vk + * vk and vk are swapped + */ +tcg_out_opc_vandn_v(s, a0, a2, a1); +break; +case INDEX_op_or_vec: +tcg_out_opc_vor_v(s, a0, a1, a2); +break; +case INDEX_op_orc_vec: +tcg_out_opc_vorn_v(s, a0, a1, a2); +break; +case INDEX_op_xor_vec: +tcg_out_opc_vxor_v(s, a0, a1, a2); +break; +case INDEX_op_nor_vec: +tcg_out_opc_vnor_v(s, a0, a1, a2); +break; +case INDEX_op_not_vec: +tcg_out_opc_vnor_v(s, a0, a1, a1); +break; case INDEX_op_cmp_vec: TCGCond cond = args[3]; if (const_args[2]) { @@ -1777,6 +1803,13 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_cmp_vec: case INDEX_op_add_vec: case INDEX_op_sub_vec: +case INDEX_op_and_vec: +case INDEX_op_andc_vec: +case INDEX_op_or_vec: +case INDEX_op_orc_vec: +case INDEX_op_xor_vec: +case INDEX_op_nor_vec: +case INDEX_op_not_vec: return 1; default: return 0; @@ -1945,6 +1978,17 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_sub_vec: return C_O1_I2(w, w, wi); +case INDEX_op_and_vec: +case INDEX_op_andc_vec: +case INDEX_op_or_vec: +case INDEX_op_orc_vec: +case INDEX_op_xor_vec: +case INDEX_op_nor_vec: +return C_O1_I2(w, w, w); + +case INDEX_op_not_vec: +return C_O1_I1(w, w); + default: g_assert_not_reached(); } diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 2f27d05e0c..bf72b26ca2 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -175,13 +175,13 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_v128 use_lsx_instructions #define TCG_TARGET_HAS_v256 0 -#define TCG_TARGET_HAS_not_vec 0 +#define TCG_TARGET_HAS_not_vec 1 #define TCG_TARGET_HAS_neg_vec 0 #define TCG_TARGET_HAS_abs_vec 0 -#define TCG_TARGET_HAS_andc_vec 0 -#define TCG_TARGET_HAS_orc_vec 0 +#define TCG_TARGET_HAS_andc_vec 1 +#define TCG_TARGET_HAS_orc_vec 1 #define TCG_TARGET_HAS_nand_vec 0 -#define TCG_TARGET_HAS_nor_vec 0 +#define TCG_TARGET_HAS_nor_vec 1 #define TCG_TARGET_HAS_eqv_vec 0 #define TCG_TARGET_HAS_mul_vec 0 #define TCG_TARGET_HAS_shi_vec 0 -- 2.42.0
[PATCH v2 02/14] tcg/loongarch64: Lower basic tcg vec ops to LSX
LSX support on host cpu is detected via hwcap. Lower the following ops to LSX: - dup_vec - dupi_vec - dupm_vec - ld_vec - st_vec Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 2 + tcg/loongarch64/tcg-target-con-str.h | 1 + tcg/loongarch64/tcg-target.c.inc | 219 ++- tcg/loongarch64/tcg-target.h | 38 - tcg/loongarch64/tcg-target.opc.h | 12 ++ 5 files changed, 270 insertions(+), 2 deletions(-) create mode 100644 tcg/loongarch64/tcg-target.opc.h diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index c2bde44613..37b3f80bf9 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -17,7 +17,9 @@ C_O0_I1(r) C_O0_I2(rZ, r) C_O0_I2(rZ, rZ) +C_O0_I2(w, r) C_O1_I1(r, r) +C_O1_I1(w, r) C_O1_I2(r, r, rC) C_O1_I2(r, r, ri) C_O1_I2(r, r, rI) diff --git a/tcg/loongarch64/tcg-target-con-str.h b/tcg/loongarch64/tcg-target-con-str.h index 6e9ccca3ad..81b8d40278 100644 --- a/tcg/loongarch64/tcg-target-con-str.h +++ b/tcg/loongarch64/tcg-target-con-str.h @@ -14,6 +14,7 @@ * REGS(letter, register_mask) */ REGS('r', ALL_GENERAL_REGS) +REGS('w', ALL_VECTOR_REGS) /* * Define constraint letters for constants: diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index baf5fc3819..150278e112 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -32,6 +32,8 @@ #include "../tcg-ldst.c.inc" #include +bool use_lsx_instructions; + #ifdef CONFIG_DEBUG_TCG static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { "zero", @@ -65,7 +67,39 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { "s5", "s6", "s7", -"s8" +"s8", +"vr0", +"vr1", +"vr2", +"vr3", +"vr4", +"vr5", +"vr6", +"vr7", +"vr8", +"vr9", +"vr10", +"vr11", +"vr12", +"vr13", +"vr14", +"vr15", +"vr16", +"vr17", +"vr18", +"vr19", +"vr20", +"vr21", +"vr22", +"vr23", +"vr24", +"vr25", +"vr26", +"vr27", +"vr28", +"vr29", +"vr30", +"vr31", }; #endif @@ -102,6 +136,15 @@ static const int tcg_target_reg_alloc_order[] = { TCG_REG_A2, TCG_REG_A1, TCG_REG_A0, + +/* Vector registers */ +TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3, +TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7, +TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11, +TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15, +TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, +TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, +/* V24 - V31 are caller-saved, and skipped. */ }; static const int tcg_target_call_iarg_regs[] = { @@ -135,6 +178,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot) #define TCG_CT_CONST_WSZ 0x2000 #define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 32) +#define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32) static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len) { @@ -1486,6 +1530,154 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, } } +static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece, +TCGReg rd, TCGReg rs) +{ +switch (vece) { +case MO_8: +tcg_out_opc_vreplgr2vr_b(s, rd, rs); +break; +case MO_16: +tcg_out_opc_vreplgr2vr_h(s, rd, rs); +break; +case MO_32: +tcg_out_opc_vreplgr2vr_w(s, rd, rs); +break; +case MO_64: +tcg_out_opc_vreplgr2vr_d(s, rd, rs); +break; +default: +g_assert_not_reached(); +} +return true; +} + +static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece, + TCGReg r, TCGReg base, intptr_t offset) +{ +/* Handle imm overflow and division (vldrepl.d imm is divided by 8) */ +if (offset < -0x800 || offset > 0x7ff || \ +(offset & ((1 << vece) - 1)) != 0) { +tcg_out_addi(s, TCG_TYPE_I64, TCG_REG_TMP0, base, offset); +base = TCG_REG_TMP0; +offset = 0; +} +offset >>= vece; + +switch (vece) { +case MO_8: +tcg_out_opc_vldrepl_b(s, r, base, offset); +break; +case MO_16: +tcg_out_opc_vldrepl_h(s, r, base, offset); +break; +case MO_32: +tcg_out_opc_vldrepl_w(s, r, base, offset); +break; +case MO_64: +tcg_out_opc_vldrepl_d(s
[PATCH v2 08/14] tcg/loongarch64: Lower vector min max ops
Lower the following ops: - smin_vec - smax_vec - umin_vec - umax_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 32 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 07c030b262..ad1fbf0339 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1659,6 +1659,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn mul_vec_insn[4] = { OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D }; +static const LoongArchInsn smin_vec_insn[4] = { +OPC_VMIN_B, OPC_VMIN_H, OPC_VMIN_W, OPC_VMIN_D +}; +static const LoongArchInsn umin_vec_insn[4] = { +OPC_VMIN_BU, OPC_VMIN_HU, OPC_VMIN_WU, OPC_VMIN_DU +}; +static const LoongArchInsn smax_vec_insn[4] = { +OPC_VMAX_B, OPC_VMAX_H, OPC_VMAX_W, OPC_VMAX_D +}; +static const LoongArchInsn umax_vec_insn[4] = { +OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU +}; a0 = args[0]; a1 = args[1]; @@ -1797,6 +1809,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_mul_vec: tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_smin_vec: +tcg_out32(s, encode_vdvjvk_insn(smin_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_smax_vec: +tcg_out32(s, encode_vdvjvk_insn(smax_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_umin_vec: +tcg_out32(s, encode_vdvjvk_insn(umin_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_umax_vec: +tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1824,6 +1848,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_not_vec: case INDEX_op_neg_vec: case INDEX_op_mul_vec: +case INDEX_op_smin_vec: +case INDEX_op_smax_vec: +case INDEX_op_umin_vec: +case INDEX_op_umax_vec: return 1; default: return 0; @@ -1999,6 +2027,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_xor_vec: case INDEX_op_nor_vec: case INDEX_op_mul_vec: +case INDEX_op_smin_vec: +case INDEX_op_smax_vec: +case INDEX_op_umin_vec: +case INDEX_op_umax_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 0880a2903d..2b81a06c89 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -191,7 +191,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 0 #define TCG_TARGET_HAS_sat_vec 0 -#define TCG_TARGET_HAS_minmax_vec 0 +#define TCG_TARGET_HAS_minmax_vec 1 #define TCG_TARGET_HAS_bitsel_vec 0 #define TCG_TARGET_HAS_cmpsel_vec 0 -- 2.42.0
[PATCH v2 04/14] tcg/loongarch64: Lower add/sub_vec to vadd/vsub
Lower the following ops: - add_vec - sub_vec Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target.c.inc | 58 2 files changed, 59 insertions(+) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index d04916db25..eaa015e813 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -31,5 +31,6 @@ C_O1_I2(r, 0, rZ) C_O1_I2(r, rZ, ri) C_O1_I2(r, rZ, rJ) C_O1_I2(r, rZ, rZ) +C_O1_I2(w, w, wi) C_O1_I2(w, w, wJ) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 18fe5fc148..555080f2b0 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1641,6 +1641,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, [TCG_COND_LTU] = {OPC_VSLTI_BU, OPC_VSLTI_HU, OPC_VSLTI_WU, OPC_VSLTI_DU}, }; LoongArchInsn insn; +static const LoongArchInsn add_vec_insn[4] = { +OPC_VADD_B, OPC_VADD_H, OPC_VADD_W, OPC_VADD_D +}; +static const LoongArchInsn add_vec_imm_insn[4] = { +OPC_VADDI_BU, OPC_VADDI_HU, OPC_VADDI_WU, OPC_VADDI_DU +}; +static const LoongArchInsn sub_vec_insn[4] = { +OPC_VSUB_B, OPC_VSUB_H, OPC_VSUB_W, OPC_VSUB_D +}; +static const LoongArchInsn sub_vec_imm_insn[4] = { +OPC_VSUBI_BU, OPC_VSUBI_HU, OPC_VSUBI_WU, OPC_VSUBI_DU +}; a0 = args[0]; a1 = args[1]; @@ -1707,6 +1719,46 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, } tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2)); break; +case INDEX_op_add_vec: +if (const_args[2]) { +int64_t value = sextract64(a2, 0, 8 << vece); +/* Try vaddi/vsubi */ +if (0 <= value && value <= 0x1f) { +tcg_out32(s, encode_vdvjuk5_insn(add_vec_imm_insn[vece], a0, \ + a1, value)); +break; +} else if (-0x1f <= value && value < 0) { +tcg_out32(s, encode_vdvjuk5_insn(sub_vec_imm_insn[vece], a0, \ + a1, -value)); +break; +} + +/* Fallback to dupi + vadd */ +tcg_out_dupi_vec(s, type, vece, temp_vec, a2); +a2 = temp_vec; +} +tcg_out32(s, encode_vdvjvk_insn(add_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sub_vec: +if (const_args[2]) { +int64_t value = sextract64(a2, 0, 8 << vece); +/* Try vaddi/vsubi */ +if (0 <= value && value <= 0x1f) { +tcg_out32(s, encode_vdvjuk5_insn(sub_vec_imm_insn[vece], a0, \ + a1, value)); +break; +} else if (-0x1f <= value && value < 0) { +tcg_out32(s, encode_vdvjuk5_insn(add_vec_imm_insn[vece], a0, \ + a1, -value)); +break; +} + +/* Fallback to dupi + vsub */ +tcg_out_dupi_vec(s, type, vece, temp_vec, a2); +a2 = temp_vec; +} +tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1723,6 +1775,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_dup_vec: case INDEX_op_dupm_vec: case INDEX_op_cmp_vec: +case INDEX_op_add_vec: +case INDEX_op_sub_vec: return 1; default: return 0; @@ -1887,6 +1941,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_cmp_vec: return C_O1_I2(w, w, wJ); +case INDEX_op_add_vec: +case INDEX_op_sub_vec: +return C_O1_I2(w, w, wi); + default: g_assert_not_reached(); } -- 2.42.0
[PATCH v2 07/14] tcg/loongarch64: Lower mul_vec to vmul
Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 8 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 16bcc2cf1b..07c030b262 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1656,6 +1656,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn neg_vec_insn[4] = { OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D }; +static const LoongArchInsn mul_vec_insn[4] = { +OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D +}; a0 = args[0]; a1 = args[1]; @@ -1791,6 +1794,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_neg_vec: tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1)); break; +case INDEX_op_mul_vec: +tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1817,6 +1823,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_nor_vec: case INDEX_op_not_vec: case INDEX_op_neg_vec: +case INDEX_op_mul_vec: return 1; default: return 0; @@ -1991,6 +1998,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_orc_vec: case INDEX_op_xor_vec: case INDEX_op_nor_vec: +case INDEX_op_mul_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index c992c4b297..0880a2903d 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -183,7 +183,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_nand_vec 0 #define TCG_TARGET_HAS_nor_vec 1 #define TCG_TARGET_HAS_eqv_vec 0 -#define TCG_TARGET_HAS_mul_vec 0 +#define TCG_TARGET_HAS_mul_vec 1 #define TCG_TARGET_HAS_shi_vec 0 #define TCG_TARGET_HAS_shs_vec 0 #define TCG_TARGET_HAS_shv_vec 0 -- 2.42.0
[PATCH v2 10/14] tcg/loongarch64: Lower vector shift vector ops
Lower the following ops: - shlv_vec - shrv_vec - sarv_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 24 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 1e587a82b1..9f02805c4b 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1683,6 +1683,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn ussub_vec_insn[4] = { OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU }; +static const LoongArchInsn shlv_vec_insn[4] = { +OPC_VSLL_B, OPC_VSLL_H, OPC_VSLL_W, OPC_VSLL_D +}; +static const LoongArchInsn shrv_vec_insn[4] = { +OPC_VSRL_B, OPC_VSRL_H, OPC_VSRL_W, OPC_VSRL_D +}; +static const LoongArchInsn sarv_vec_insn[4] = { +OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D +}; a0 = args[0]; a1 = args[1]; @@ -1845,6 +1854,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_ussub_vec: tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_shlv_vec: +tcg_out32(s, encode_vdvjvk_insn(shlv_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_shrv_vec: +tcg_out32(s, encode_vdvjvk_insn(shrv_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sarv_vec: +tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1880,6 +1898,9 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_usadd_vec: case INDEX_op_sssub_vec: case INDEX_op_ussub_vec: +case INDEX_op_shlv_vec: +case INDEX_op_shrv_vec: +case INDEX_op_sarv_vec: return 1; default: return 0; @@ -2063,6 +2084,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_usadd_vec: case INDEX_op_sssub_vec: case INDEX_op_ussub_vec: +case INDEX_op_shlv_vec: +case INDEX_op_shrv_vec: +case INDEX_op_sarv_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 72bfd0d440..d27f3737ad 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -186,7 +186,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_mul_vec 1 #define TCG_TARGET_HAS_shi_vec 0 #define TCG_TARGET_HAS_shs_vec 0 -#define TCG_TARGET_HAS_shv_vec 0 +#define TCG_TARGET_HAS_shv_vec 1 #define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 0 -- 2.42.0
[PATCH v2 00/14] Lower TCG vector ops to LSX
This patch series allows qemu to utilize LSX instructions on LoongArch machines to execute TCG vector ops. Passed tcg tests with x86_64 and aarch64 cross compilers. Changes since v1: - Optimize dupi_vec/st_vec/ld_vec/cmp_vec/add_vec/sub_vec generation - Lower not_vec/shi_vec/roti_vec/rotv_vec Jiajie Chen (14): tcg/loongarch64: Import LSX instructions tcg/loongarch64: Lower basic tcg vec ops to LSX tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt tcg/loongarch64: Lower add/sub_vec to vadd/vsub tcg/loongarch64: Lower vector bitwise operations tcg/loongarch64: Lower neg_vec to vneg tcg/loongarch64: Lower mul_vec to vmul tcg/loongarch64: Lower vector min max ops tcg/loongarch64: Lower vector saturated ops tcg/loongarch64: Lower vector shift vector ops tcg/loongarch64: Lower bitsel_vec to vbitsel tcg/loongarch64: Lower vector shift integer ops tcg/loongarch64: Lower rotv_vec ops to LSX tcg/loongarch64: Lower rotli_vec to vrotri tcg/loongarch64/tcg-insn-defs.c.inc | 6251 +- tcg/loongarch64/tcg-target-con-set.h |7 + tcg/loongarch64/tcg-target-con-str.h |1 + tcg/loongarch64/tcg-target.c.inc | 550 ++- tcg/loongarch64/tcg-target.h | 38 +- tcg/loongarch64/tcg-target.opc.h | 12 + 6 files changed, 6740 insertions(+), 119 deletions(-) create mode 100644 tcg/loongarch64/tcg-target.opc.h -- 2.42.0
[PATCH v2 06/14] tcg/loongarch64: Lower neg_vec to vneg
Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 8 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 20e25dc490..16bcc2cf1b 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1653,6 +1653,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn sub_vec_imm_insn[4] = { OPC_VSUBI_BU, OPC_VSUBI_HU, OPC_VSUBI_WU, OPC_VSUBI_DU }; +static const LoongArchInsn neg_vec_insn[4] = { +OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D +}; a0 = args[0]; a1 = args[1]; @@ -1785,6 +1788,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, } tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_neg_vec: +tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1810,6 +1816,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_xor_vec: case INDEX_op_nor_vec: case INDEX_op_not_vec: +case INDEX_op_neg_vec: return 1; default: return 0; @@ -1987,6 +1994,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) return C_O1_I2(w, w, w); case INDEX_op_not_vec: +case INDEX_op_neg_vec: return C_O1_I1(w, w); default: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index bf72b26ca2..c992c4b297 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -176,7 +176,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_v256 0 #define TCG_TARGET_HAS_not_vec 1 -#define TCG_TARGET_HAS_neg_vec 0 +#define TCG_TARGET_HAS_neg_vec 1 #define TCG_TARGET_HAS_abs_vec 0 #define TCG_TARGET_HAS_andc_vec 1 #define TCG_TARGET_HAS_orc_vec 1 -- 2.42.0
[PATCH v2 09/14] tcg/loongarch64: Lower vector saturated ops
Lower the following ops: - ssadd_vec - usadd_vec - sssub_vec - ussub_vec Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target.c.inc | 32 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index ad1fbf0339..1e587a82b1 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1671,6 +1671,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn umax_vec_insn[4] = { OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU }; +static const LoongArchInsn ssadd_vec_insn[4] = { +OPC_VSADD_B, OPC_VSADD_H, OPC_VSADD_W, OPC_VSADD_D +}; +static const LoongArchInsn usadd_vec_insn[4] = { +OPC_VSADD_BU, OPC_VSADD_HU, OPC_VSADD_WU, OPC_VSADD_DU +}; +static const LoongArchInsn sssub_vec_insn[4] = { +OPC_VSSUB_B, OPC_VSSUB_H, OPC_VSSUB_W, OPC_VSSUB_D +}; +static const LoongArchInsn ussub_vec_insn[4] = { +OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU +}; a0 = args[0]; a1 = args[1]; @@ -1821,6 +1833,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_umax_vec: tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_ssadd_vec: +tcg_out32(s, encode_vdvjvk_insn(ssadd_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_usadd_vec: +tcg_out32(s, encode_vdvjvk_insn(usadd_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sssub_vec: +tcg_out32(s, encode_vdvjvk_insn(sssub_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_ussub_vec: +tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1852,6 +1876,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_smax_vec: case INDEX_op_umin_vec: case INDEX_op_umax_vec: +case INDEX_op_ssadd_vec: +case INDEX_op_usadd_vec: +case INDEX_op_sssub_vec: +case INDEX_op_ussub_vec: return 1; default: return 0; @@ -2031,6 +2059,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_smax_vec: case INDEX_op_umin_vec: case INDEX_op_umax_vec: +case INDEX_op_ssadd_vec: +case INDEX_op_usadd_vec: +case INDEX_op_sssub_vec: +case INDEX_op_ussub_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 2b81a06c89..72bfd0d440 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -190,7 +190,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 0 -#define TCG_TARGET_HAS_sat_vec 0 +#define TCG_TARGET_HAS_sat_vec 1 #define TCG_TARGET_HAS_minmax_vec 1 #define TCG_TARGET_HAS_bitsel_vec 0 #define TCG_TARGET_HAS_cmpsel_vec 0 -- 2.42.0
[PATCH v2 13/14] tcg/loongarch64: Lower rotv_vec ops to LSX
Lower the following ops: - rotrv_vec - rotlv_vec Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target.c.inc | 14 ++ tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index ccb362205e..6fe319a77e 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1701,6 +1701,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn sari_vec_insn[4] = { OPC_VSRAI_B, OPC_VSRAI_H, OPC_VSRAI_W, OPC_VSRAI_D }; +static const LoongArchInsn rotrv_vec_insn[4] = { +OPC_VROTR_B, OPC_VROTR_H, OPC_VROTR_W, OPC_VROTR_D +}; a0 = args[0]; a1 = args[1]; @@ -1882,6 +1885,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_sari_vec: tcg_out32(s, encode_vdvjuk3_insn(sari_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_rotrv_vec: +tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_rotlv_vec: +/* rotlv_vec a1, a2 = rotrv_vec a1, -a2 */ +tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], temp_vec, a2)); +tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1, +temp_vec)); +break; case INDEX_op_bitsel_vec: /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */ tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1); @@ -2111,6 +2123,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_shlv_vec: case INDEX_op_shrv_vec: case INDEX_op_sarv_vec: +case INDEX_op_rotrv_vec: +case INDEX_op_rotlv_vec: return C_O1_I2(w, w, w); case INDEX_op_not_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index b4dab03469..f6eb3cf7a6 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -189,7 +189,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_shv_vec 1 #define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 0 -#define TCG_TARGET_HAS_rotv_vec 0 +#define TCG_TARGET_HAS_rotv_vec 1 #define TCG_TARGET_HAS_sat_vec 1 #define TCG_TARGET_HAS_minmax_vec 1 #define TCG_TARGET_HAS_bitsel_vec 1 -- 2.42.0
[PATCH v2 14/14] tcg/loongarch64: Lower rotli_vec to vrotri
Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target.c.inc | 21 + tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 6fe319a77e..c4e9e0309e 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1894,6 +1894,26 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1, temp_vec)); break; +case INDEX_op_rotli_vec: +/* rotli_vec a1, a2 = rotri_vec a1, -a2 */ +a2 = extract32(-a2, 0, 3 + vece); +switch (vece) { +case MO_8: +tcg_out_opc_vrotri_b(s, a0, a1, a2); +break; +case MO_16: +tcg_out_opc_vrotri_h(s, a0, a1, a2); +break; +case MO_32: +tcg_out_opc_vrotri_w(s, a0, a1, a2); +break; +case MO_64: +tcg_out_opc_vrotri_d(s, a0, a1, a2); +break; +default: +g_assert_not_reached(); +} +break; case INDEX_op_bitsel_vec: /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */ tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1); @@ -2132,6 +2152,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_shli_vec: case INDEX_op_shri_vec: case INDEX_op_sari_vec: +case INDEX_op_rotli_vec: return C_O1_I1(w, w); case INDEX_op_bitsel_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index f6eb3cf7a6..3dc2dbf800 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -187,7 +187,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_shi_vec 1 #define TCG_TARGET_HAS_shs_vec 0 #define TCG_TARGET_HAS_shv_vec 1 -#define TCG_TARGET_HAS_roti_vec 0 +#define TCG_TARGET_HAS_roti_vec 1 #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 1 #define TCG_TARGET_HAS_sat_vec 1 -- 2.42.0
[PATCH v2 11/14] tcg/loongarch64: Lower bitsel_vec to vbitsel
Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target.c.inc | 11 ++- tcg/loongarch64/tcg-target.h | 2 +- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 13a7f3b5e2..fd2bd785e5 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -35,4 +35,5 @@ C_O1_I2(r, rZ, rZ) C_O1_I2(w, w, w) C_O1_I2(w, w, wi) C_O1_I2(w, w, wJ) +C_O1_I3(w, w, w, w) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 9f02805c4b..8de4c36396 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1622,7 +1622,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, const int const_args[TCG_MAX_OP_ARGS]) { TCGType type = vecl + TCG_TYPE_V64; -TCGArg a0, a1, a2; +TCGArg a0, a1, a2, a3; TCGReg temp = TCG_REG_TMP0; TCGReg temp_vec = TCG_VEC_TMP0; @@ -1696,6 +1696,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, a0 = args[0]; a1 = args[1]; a2 = args[2]; +a3 = args[3]; /* Currently only supports V128 */ tcg_debug_assert(type == TCG_TYPE_V128); @@ -1863,6 +1864,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_sarv_vec: tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_bitsel_vec: +/* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */ +tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1901,6 +1906,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_shlv_vec: case INDEX_op_shrv_vec: case INDEX_op_sarv_vec: +case INDEX_op_bitsel_vec: return 1; default: return 0; @@ -2093,6 +2099,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_neg_vec: return C_O1_I1(w, w); +case INDEX_op_bitsel_vec: +return C_O1_I3(w, w, w, w); + default: g_assert_not_reached(); } diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index d27f3737ad..c77672d92c 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -192,7 +192,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_rotv_vec 0 #define TCG_TARGET_HAS_sat_vec 1 #define TCG_TARGET_HAS_minmax_vec 1 -#define TCG_TARGET_HAS_bitsel_vec 0 +#define TCG_TARGET_HAS_bitsel_vec 1 #define TCG_TARGET_HAS_cmpsel_vec 0 #define TCG_TARGET_DEFAULT_MO (0) -- 2.42.0
[PATCH v2 12/14] tcg/loongarch64: Lower vector shift integer ops
Lower the following ops: - shli_vec - shrv_vec - sarv_vec Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target.c.inc | 21 + tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 8de4c36396..ccb362205e 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1692,6 +1692,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn sarv_vec_insn[4] = { OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D }; +static const LoongArchInsn shli_vec_insn[4] = { +OPC_VSLLI_B, OPC_VSLLI_H, OPC_VSLLI_W, OPC_VSLLI_D +}; +static const LoongArchInsn shri_vec_insn[4] = { +OPC_VSRLI_B, OPC_VSRLI_H, OPC_VSRLI_W, OPC_VSRLI_D +}; +static const LoongArchInsn sari_vec_insn[4] = { +OPC_VSRAI_B, OPC_VSRAI_H, OPC_VSRAI_W, OPC_VSRAI_D +}; a0 = args[0]; a1 = args[1]; @@ -1864,6 +1873,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_sarv_vec: tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_shli_vec: +tcg_out32(s, encode_vdvjuk3_insn(shli_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_shri_vec: +tcg_out32(s, encode_vdvjuk3_insn(shri_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sari_vec: +tcg_out32(s, encode_vdvjuk3_insn(sari_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_bitsel_vec: /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */ tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1); @@ -2097,6 +2115,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_not_vec: case INDEX_op_neg_vec: +case INDEX_op_shli_vec: +case INDEX_op_shri_vec: +case INDEX_op_sari_vec: return C_O1_I1(w, w); case INDEX_op_bitsel_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index c77672d92c..b4dab03469 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -184,7 +184,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_nor_vec 1 #define TCG_TARGET_HAS_eqv_vec 0 #define TCG_TARGET_HAS_mul_vec 1 -#define TCG_TARGET_HAS_shi_vec 0 +#define TCG_TARGET_HAS_shi_vec 1 #define TCG_TARGET_HAS_shs_vec 0 #define TCG_TARGET_HAS_shv_vec 1 #define TCG_TARGET_HAS_roti_vec 0 -- 2.42.0
[PATCH v2 03/14] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt
Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target.c.inc | 60 2 files changed, 61 insertions(+) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 37b3f80bf9..d04916db25 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -31,4 +31,5 @@ C_O1_I2(r, 0, rZ) C_O1_I2(r, rZ, ri) C_O1_I2(r, rZ, rJ) C_O1_I2(r, rZ, rZ) +C_O1_I2(w, w, wJ) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 150278e112..18fe5fc148 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1624,6 +1624,23 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, TCGType type = vecl + TCG_TYPE_V64; TCGArg a0, a1, a2; TCGReg temp = TCG_REG_TMP0; +TCGReg temp_vec = TCG_VEC_TMP0; + +static const LoongArchInsn cmp_vec_insn[16][4] = { +[TCG_COND_EQ] = {OPC_VSEQ_B, OPC_VSEQ_H, OPC_VSEQ_W, OPC_VSEQ_D}, +[TCG_COND_LE] = {OPC_VSLE_B, OPC_VSLE_H, OPC_VSLE_W, OPC_VSLE_D}, +[TCG_COND_LEU] = {OPC_VSLE_BU, OPC_VSLE_HU, OPC_VSLE_WU, OPC_VSLE_DU}, +[TCG_COND_LT] = {OPC_VSLT_B, OPC_VSLT_H, OPC_VSLT_W, OPC_VSLT_D}, +[TCG_COND_LTU] = {OPC_VSLT_BU, OPC_VSLT_HU, OPC_VSLT_WU, OPC_VSLT_DU}, +}; +static const LoongArchInsn cmp_vec_imm_insn[16][4] = { +[TCG_COND_EQ] = {OPC_VSEQI_B, OPC_VSEQI_H, OPC_VSEQI_W, OPC_VSEQI_D}, +[TCG_COND_LE] = {OPC_VSLEI_B, OPC_VSLEI_H, OPC_VSLEI_W, OPC_VSLEI_D}, +[TCG_COND_LEU] = {OPC_VSLEI_BU, OPC_VSLEI_HU, OPC_VSLEI_WU, OPC_VSLEI_DU}, +[TCG_COND_LT] = {OPC_VSLTI_B, OPC_VSLTI_H, OPC_VSLTI_W, OPC_VSLTI_D}, +[TCG_COND_LTU] = {OPC_VSLTI_BU, OPC_VSLTI_HU, OPC_VSLTI_WU, OPC_VSLTI_DU}, +}; +LoongArchInsn insn; a0 = args[0]; a1 = args[1]; @@ -1651,6 +1668,45 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, tcg_out_opc_vldx(s, a0, a1, temp); } break; +case INDEX_op_cmp_vec: +TCGCond cond = args[3]; +if (const_args[2]) { +/* + * cmp_vec dest, src, value + * Try vseqi/vslei/vslti + */ +int64_t value = sextract64(a2, 0, 8 << vece); +if ((cond == TCG_COND_EQ || cond == TCG_COND_LE || \ + cond == TCG_COND_LT) && (-0x10 <= value && value <= 0x0f)) { +tcg_out32(s, encode_vdvjsk5_insn(cmp_vec_imm_insn[cond][vece], \ + a0, a1, value)); +break; +} else if ((cond == TCG_COND_LEU || cond == TCG_COND_LTU) && +(0x00 <= value && value <= 0x1f)) { +tcg_out32(s, encode_vdvjuk5_insn(cmp_vec_imm_insn[cond][vece], \ + a0, a1, value)); +break; +} + +/* + * Fallback to: + * dupi_vec temp, a2 + * cmp_vec a0, a1, temp, cond + */ +tcg_out_dupi_vec(s, type, vece, temp_vec, a2); +a2 = temp_vec; +} + +insn = cmp_vec_insn[cond][vece]; +if (insn == 0) { +TCGArg t; +t = a1, a1 = a2, a2 = t; +cond = tcg_swap_cond(cond); +insn = cmp_vec_insn[cond][vece]; +tcg_debug_assert(insn != 0); +} +tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1666,6 +1722,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_st_vec: case INDEX_op_dup_vec: case INDEX_op_dupm_vec: +case INDEX_op_cmp_vec: return 1; default: return 0; @@ -1827,6 +1884,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_st_vec: return C_O0_I2(w, r); +case INDEX_op_cmp_vec: +return C_O1_I2(w, w, wJ); + default: g_assert_not_reached(); } -- 2.42.0
Re: [PATCH 02/11] tcg/loongarch64: Lower basic tcg vec ops to LSX
There seems to some problem with the email server, try my another email address to send this email. On 2023/8/29 00:57, Richard Henderson wrote: On 8/28/23 08:19, Jiajie Chen wrote: +static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece, + TCGReg rd, int64_t v64) +{ + /* Try vldi if imm can fit */ + if (vece <= MO_32 && (-0x200 <= v64 && v64 <= 0x1FF)) { + uint32_t imm = (vece << 10) | ((uint32_t)v64 & 0x3FF); + tcg_out_opc_vldi(s, rd, imm); + return; + } v64 has the value replicated across 64 bits. In order to do the comparison above, you'll want int64_t vale = sextract64(v64, 0, 8 << vece); if (-0x200 <= vale && vale <= 0x1ff) ... Since the only documentation for LSX is qemu's own translator code, why are you testing vece <= MO_32? MO_64 should be available as well? Or is there a bug in trans_vldi()? Sorry, my mistake. I was messing MO_64 with bit 12 in vldi imm. It might be nice to leave a to-do for vldi imm bit 12 set, for the patterns expanded by vldi_get_value(). In particular, mode == 9 is likely to be useful, and modes {1,2,3,5} are easy to test for. Sure, I was thinking about the complexity of pattern matching on those modes, and decided to skip the hard part in the first patch series. + + /* Fallback to vreplgr2vr */ + tcg_out_movi(s, type, TCG_REG_TMP0, v64); type is a vector type; you can't use it here. Correct would be TCG_TYPE_I64. Better to load vale instead, since that will take fewer insns in tcg_out_movi. Sure. +static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, + unsigned vecl, unsigned vece, + const TCGArg args[TCG_MAX_OP_ARGS], + const int const_args[TCG_MAX_OP_ARGS]) +{ + TCGType type = vecl + TCG_TYPE_V64; + TCGArg a0, a1, a2; + TCGReg base; + TCGReg temp = TCG_REG_TMP0; + int32_t offset; + + a0 = args[0]; + a1 = args[1]; + a2 = args[2]; + + /* Currently only supports V128 */ + tcg_debug_assert(type == TCG_TYPE_V128); + + switch (opc) { + case INDEX_op_st_vec: + /* Try to fit vst imm */ + if (-0x800 <= a2 && a2 <= 0x7ff) { + base = a1; + offset = a2; + } else { + tcg_out_addi(s, TCG_TYPE_I64, temp, a1, a2); + base = temp; + offset = 0; + } + tcg_out_opc_vst(s, a0, base, offset); + break; + case INDEX_op_ld_vec: + /* Try to fit vld imm */ + if (-0x800 <= a2 && a2 <= 0x7ff) { + base = a1; + offset = a2; + } else { + tcg_out_addi(s, TCG_TYPE_I64, temp, a1, a2); + base = temp; + offset = 0; + } + tcg_out_opc_vld(s, a0, base, offset); tcg_out_addi has a hole in bits [15:12], and can take an extra insn if those bits are set. Better to load the offset with tcg_out_movi and then use VLDX/VSTX instead of VLD/VST. Sure. @@ -159,6 +170,30 @@ typedef enum { #define TCG_TARGET_HAS_mulsh_i64 1 #define TCG_TARGET_HAS_qemu_ldst_i128 0 +#define TCG_TARGET_HAS_v64 0 +#define TCG_TARGET_HAS_v128 use_lsx_instructions +#define TCG_TARGET_HAS_v256 0 Perhaps reserve for a follow-up, but TCG_TARGET_HAS_v64 can easily be supported using the same instructions. The only difference is load/store, where you could use FLD.D/FST.D to load the lower 64-bits of the fp/vector register, or VLDREPL.D to load and initialize all bits and VSTELM.D to store the lower 64-bits. I tend to think the float insns are more flexible, having a larger displacement, and the availability of FLDX/FSTX as well. Sure. r~
[PATCH 05/11] tcg/loongarch64: Lower vector bitwise operations
Lower the following ops: - and_vec - andc_vec - or_vec - orc_vec - xor_vec - nor_vec Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target.c.inc | 35 tcg/loongarch64/tcg-target.h | 6 +++--- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index eb340a6493..fe741ef045 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1671,6 +1671,29 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, } tcg_out_opc_vld(s, a0, base, offset); break; +case INDEX_op_and_vec: +tcg_out_opc_vand_v(s, a0, a1, a2); +break; +case INDEX_op_andc_vec: +/* + * vandn vd, vj, vk: vd = vk & ~vj + * andc_vec vd, vj, vk: vd = vj & ~vk + * vk and vk are swapped + */ +tcg_out_opc_vandn_v(s, a0, a2, a1); +break; +case INDEX_op_or_vec: +tcg_out_opc_vor_v(s, a0, a1, a2); +break; +case INDEX_op_orc_vec: +tcg_out_opc_vorn_v(s, a0, a1, a2); +break; +case INDEX_op_xor_vec: +tcg_out_opc_vxor_v(s, a0, a1, a2); +break; +case INDEX_op_nor_vec: +tcg_out_opc_vnor_v(s, a0, a1, a2); +break; case INDEX_op_cmp_vec: TCGCond cond = args[3]; insn = cmp_vec_insn[cond][vece]; @@ -1707,6 +1730,12 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_cmp_vec: case INDEX_op_add_vec: case INDEX_op_sub_vec: +case INDEX_op_and_vec: +case INDEX_op_andc_vec: +case INDEX_op_or_vec: +case INDEX_op_orc_vec: +case INDEX_op_xor_vec: +case INDEX_op_nor_vec: return 1; default: return 0; @@ -1871,6 +1900,12 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_cmp_vec: case INDEX_op_add_vec: case INDEX_op_sub_vec: +case INDEX_op_and_vec: +case INDEX_op_andc_vec: +case INDEX_op_or_vec: +case INDEX_op_orc_vec: +case INDEX_op_xor_vec: +case INDEX_op_nor_vec: return C_O1_I2(w, w, w); default: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index be9343ded9..4ca685e752 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -177,10 +177,10 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_not_vec 0 #define TCG_TARGET_HAS_neg_vec 0 #define TCG_TARGET_HAS_abs_vec 0 -#define TCG_TARGET_HAS_andc_vec 0 -#define TCG_TARGET_HAS_orc_vec 0 +#define TCG_TARGET_HAS_andc_vec 1 +#define TCG_TARGET_HAS_orc_vec 1 #define TCG_TARGET_HAS_nand_vec 0 -#define TCG_TARGET_HAS_nor_vec 0 +#define TCG_TARGET_HAS_nor_vec 1 #define TCG_TARGET_HAS_eqv_vec 0 #define TCG_TARGET_HAS_mul_vec 0 #define TCG_TARGET_HAS_shi_vec 0 -- 2.42.0
[PATCH 09/11] tcg/loongarch64: Lower vector saturated ops
Lower the following ops: - ssadd_vec - usadd_vec - sssub_vec - ussub_vec Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target.c.inc | 32 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 91049a80b6..21d2365987 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1656,6 +1656,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn umax_vec_insn[4] = { OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU }; +static const LoongArchInsn ssadd_vec_insn[4] = { +OPC_VSADD_B, OPC_VSADD_H, OPC_VSADD_W, OPC_VSADD_D +}; +static const LoongArchInsn usadd_vec_insn[4] = { +OPC_VSADD_BU, OPC_VSADD_HU, OPC_VSADD_WU, OPC_VSADD_DU +}; +static const LoongArchInsn sssub_vec_insn[4] = { +OPC_VSSUB_B, OPC_VSSUB_H, OPC_VSSUB_W, OPC_VSSUB_D +}; +static const LoongArchInsn ussub_vec_insn[4] = { +OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU +}; a0 = args[0]; a1 = args[1]; @@ -1748,6 +1760,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_umax_vec: tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_ssadd_vec: +tcg_out32(s, encode_vdvjvk_insn(ssadd_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_usadd_vec: +tcg_out32(s, encode_vdvjvk_insn(usadd_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sssub_vec: +tcg_out32(s, encode_vdvjvk_insn(sssub_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_ussub_vec: +tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1778,6 +1802,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_smax_vec: case INDEX_op_umin_vec: case INDEX_op_umax_vec: +case INDEX_op_ssadd_vec: +case INDEX_op_usadd_vec: +case INDEX_op_sssub_vec: +case INDEX_op_ussub_vec: return 1; default: return 0; @@ -1953,6 +1981,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_smax_vec: case INDEX_op_umin_vec: case INDEX_op_umax_vec: +case INDEX_op_ssadd_vec: +case INDEX_op_usadd_vec: +case INDEX_op_sssub_vec: +case INDEX_op_ussub_vec: return C_O1_I2(w, w, w); case INDEX_op_neg_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 97af7f8631..4c90a1cf51 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -189,7 +189,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 0 -#define TCG_TARGET_HAS_sat_vec 0 +#define TCG_TARGET_HAS_sat_vec 1 #define TCG_TARGET_HAS_minmax_vec 1 #define TCG_TARGET_HAS_bitsel_vec 0 #define TCG_TARGET_HAS_cmpsel_vec 0 -- 2.42.0
[PATCH 06/11] tcg/loongarch64: Lower neg_vec to vneg
Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target.c.inc | 10 ++ tcg/loongarch64/tcg-target.h | 2 +- 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index e80fc7f3f7..9fce856012 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -20,6 +20,7 @@ C_O0_I2(rZ, rZ) C_O0_I2(w, r) C_O1_I1(r, r) C_O1_I1(w, r) +C_O1_I1(w, w) C_O1_I2(r, r, rC) C_O1_I2(r, r, ri) C_O1_I2(r, r, rI) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index fe741ef045..819dcdba77 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1638,6 +1638,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn sub_vec_insn[4] = { OPC_VSUB_B, OPC_VSUB_H, OPC_VSUB_W, OPC_VSUB_D }; +static const LoongArchInsn neg_vec_insn[4] = { +OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D +}; a0 = args[0]; a1 = args[1]; @@ -1712,6 +1715,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_sub_vec: tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_neg_vec: +tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1736,6 +1742,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_orc_vec: case INDEX_op_xor_vec: case INDEX_op_nor_vec: +case INDEX_op_neg_vec: return 1; default: return 0; @@ -1908,6 +1915,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_nor_vec: return C_O1_I2(w, w, w); +case INDEX_op_neg_vec: +return C_O1_I1(w, w); + default: g_assert_not_reached(); } diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 4ca685e752..6a8147875a 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -175,7 +175,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_v256 0 #define TCG_TARGET_HAS_not_vec 0 -#define TCG_TARGET_HAS_neg_vec 0 +#define TCG_TARGET_HAS_neg_vec 1 #define TCG_TARGET_HAS_abs_vec 0 #define TCG_TARGET_HAS_andc_vec 1 #define TCG_TARGET_HAS_orc_vec 1 -- 2.42.0
[PATCH 10/11] tcg/loongarch64: Lower vector shift vector ops
Lower the following ops: - shlv_vec - shrv_vec - sarv_vec Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target.c.inc | 24 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 21d2365987..caf2a7a563 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1668,6 +1668,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn ussub_vec_insn[4] = { OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU }; +static const LoongArchInsn shlv_vec_insn[4] = { +OPC_VSLL_B, OPC_VSLL_H, OPC_VSLL_W, OPC_VSLL_D +}; +static const LoongArchInsn shrv_vec_insn[4] = { +OPC_VSRL_B, OPC_VSRL_H, OPC_VSRL_W, OPC_VSRL_D +}; +static const LoongArchInsn sarv_vec_insn[4] = { +OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D +}; a0 = args[0]; a1 = args[1]; @@ -1772,6 +1781,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_ussub_vec: tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_shlv_vec: +tcg_out32(s, encode_vdvjvk_insn(shlv_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_shrv_vec: +tcg_out32(s, encode_vdvjvk_insn(shrv_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sarv_vec: +tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1806,6 +1824,9 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_usadd_vec: case INDEX_op_sssub_vec: case INDEX_op_ussub_vec: +case INDEX_op_shlv_vec: +case INDEX_op_shrv_vec: +case INDEX_op_sarv_vec: return 1; default: return 0; @@ -1985,6 +2006,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_usadd_vec: case INDEX_op_sssub_vec: case INDEX_op_ussub_vec: +case INDEX_op_shlv_vec: +case INDEX_op_shrv_vec: +case INDEX_op_sarv_vec: return C_O1_I2(w, w, w); case INDEX_op_neg_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 4c90a1cf51..771545b021 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -185,7 +185,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_mul_vec 1 #define TCG_TARGET_HAS_shi_vec 0 #define TCG_TARGET_HAS_shs_vec 0 -#define TCG_TARGET_HAS_shv_vec 0 +#define TCG_TARGET_HAS_shv_vec 1 #define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 0 -- 2.42.0
[PATCH 02/11] tcg/loongarch64: Lower basic tcg vec ops to LSX
LSX support on host cpu is detected via hwcap. Lower the following ops to LSX: - dup_vec - dupi_vec - dupm_vec - ld_vec - st_vec Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 2 + tcg/loongarch64/tcg-target-con-str.h | 1 + tcg/loongarch64/tcg-target.c.inc | 223 ++- tcg/loongarch64/tcg-target.h | 37 - tcg/loongarch64/tcg-target.opc.h | 12 ++ 5 files changed, 273 insertions(+), 2 deletions(-) create mode 100644 tcg/loongarch64/tcg-target.opc.h diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index c2bde44613..37b3f80bf9 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -17,7 +17,9 @@ C_O0_I1(r) C_O0_I2(rZ, r) C_O0_I2(rZ, rZ) +C_O0_I2(w, r) C_O1_I1(r, r) +C_O1_I1(w, r) C_O1_I2(r, r, rC) C_O1_I2(r, r, ri) C_O1_I2(r, r, rI) diff --git a/tcg/loongarch64/tcg-target-con-str.h b/tcg/loongarch64/tcg-target-con-str.h index 6e9ccca3ad..81b8d40278 100644 --- a/tcg/loongarch64/tcg-target-con-str.h +++ b/tcg/loongarch64/tcg-target-con-str.h @@ -14,6 +14,7 @@ * REGS(letter, register_mask) */ REGS('r', ALL_GENERAL_REGS) +REGS('w', ALL_VECTOR_REGS) /* * Define constraint letters for constants: diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index baf5fc3819..0f9427572c 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -32,6 +32,8 @@ #include "../tcg-ldst.c.inc" #include +bool use_lsx_instructions; + #ifdef CONFIG_DEBUG_TCG static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { "zero", @@ -65,7 +67,39 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { "s5", "s6", "s7", -"s8" +"s8", +"vr0", +"vr1", +"vr2", +"vr3", +"vr4", +"vr5", +"vr6", +"vr7", +"vr8", +"vr9", +"vr10", +"vr11", +"vr12", +"vr13", +"vr14", +"vr15", +"vr16", +"vr17", +"vr18", +"vr19", +"vr20", +"vr21", +"vr22", +"vr23", +"vr24", +"vr25", +"vr26", +"vr27", +"vr28", +"vr29", +"vr30", +"vr31", }; #endif @@ -102,6 +136,15 @@ static const int tcg_target_reg_alloc_order[] = { TCG_REG_A2, TCG_REG_A1, TCG_REG_A0, + +/* Vector registers */ +TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3, +TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7, +TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11, +TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15, +TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, +TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, +/* V24 - V31 are caller-saved, and skipped. */ }; static const int tcg_target_call_iarg_regs[] = { @@ -135,6 +178,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot) #define TCG_CT_CONST_WSZ 0x2000 #define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 32) +#define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32) static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len) { @@ -1486,6 +1530,159 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, } } +static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece, +TCGReg rd, TCGReg rs) +{ +switch (vece) { +case MO_8: +tcg_out_opc_vreplgr2vr_b(s, rd, rs); +break; +case MO_16: +tcg_out_opc_vreplgr2vr_h(s, rd, rs); +break; +case MO_32: +tcg_out_opc_vreplgr2vr_w(s, rd, rs); +break; +case MO_64: +tcg_out_opc_vreplgr2vr_d(s, rd, rs); +break; +default: +g_assert_not_reached(); +} +return true; +} + +static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece, + TCGReg r, TCGReg base, intptr_t offset) +{ +/* Handle imm overflow and division (vldrepl.d imm is divided by 8) */ +if (offset < -0x800 || offset > 0x7ff || \ +(offset & ((1 << vece) - 1)) != 0) { +tcg_out_addi(s, TCG_TYPE_I64, TCG_REG_TMP0, base, offset); +base = TCG_REG_TMP0; +offset = 0; +} +offset >>= vece; + +switch (vece) { +case MO_8: +tcg_out_opc_vldrepl_b(s, r, base, offset); +break; +case MO_16: +tcg_out_opc_vldrepl_h(s, r, base, offset); +break; +case MO_32: +tcg_out_opc_vldrepl_w(s, r, base, offset); +break; +case MO_64: +tcg_out_opc_
[PATCH 11/11] tcg/loongarch64: Lower bitsel_vec to vbitsel
Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target.c.inc | 11 ++- tcg/loongarch64/tcg-target.h | 2 +- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 9fce856012..0f709113f0 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -33,4 +33,5 @@ C_O1_I2(r, rZ, ri) C_O1_I2(r, rZ, rJ) C_O1_I2(r, rZ, rZ) C_O1_I2(w, w, w) +C_O1_I3(w, w, w, w) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index caf2a7a563..14826fad5a 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1619,7 +1619,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, const int const_args[TCG_MAX_OP_ARGS]) { TCGType type = vecl + TCG_TYPE_V64; -TCGArg a0, a1, a2; +TCGArg a0, a1, a2, a3; TCGReg base; TCGReg temp = TCG_REG_TMP0; int32_t offset; @@ -1681,6 +1681,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, a0 = args[0]; a1 = args[1]; a2 = args[2]; +a3 = args[3]; /* Currently only supports V128 */ tcg_debug_assert(type == TCG_TYPE_V128); @@ -1790,6 +1791,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_sarv_vec: tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_bitsel_vec: +/* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */ +tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1827,6 +1832,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_shlv_vec: case INDEX_op_shrv_vec: case INDEX_op_sarv_vec: +case INDEX_op_bitsel_vec: return 1; default: return 0; @@ -2014,6 +2020,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_neg_vec: return C_O1_I1(w, w); +case INDEX_op_bitsel_vec: +return C_O1_I3(w, w, w, w); + default: g_assert_not_reached(); } diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 771545b021..aafd770356 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -191,7 +191,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_rotv_vec 0 #define TCG_TARGET_HAS_sat_vec 1 #define TCG_TARGET_HAS_minmax_vec 1 -#define TCG_TARGET_HAS_bitsel_vec 0 +#define TCG_TARGET_HAS_bitsel_vec 1 #define TCG_TARGET_HAS_cmpsel_vec 0 #define TCG_TARGET_DEFAULT_MO (0) -- 2.42.0
[PATCH 08/11] tcg/loongarch64: Lower vector min max ops
Lower the following ops: - smin_vec - smax_vec - umin_vec - umax_vec Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target.c.inc | 32 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index bca24b6a20..91049a80b6 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1644,6 +1644,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn mul_vec_insn[4] = { OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D }; +static const LoongArchInsn smin_vec_insn[4] = { +OPC_VMIN_B, OPC_VMIN_H, OPC_VMIN_W, OPC_VMIN_D +}; +static const LoongArchInsn umin_vec_insn[4] = { +OPC_VMIN_BU, OPC_VMIN_HU, OPC_VMIN_WU, OPC_VMIN_DU +}; +static const LoongArchInsn smax_vec_insn[4] = { +OPC_VMAX_B, OPC_VMAX_H, OPC_VMAX_W, OPC_VMAX_D +}; +static const LoongArchInsn umax_vec_insn[4] = { +OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU +}; a0 = args[0]; a1 = args[1]; @@ -1724,6 +1736,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_mul_vec: tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2)); break; +case INDEX_op_smin_vec: +tcg_out32(s, encode_vdvjvk_insn(smin_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_smax_vec: +tcg_out32(s, encode_vdvjvk_insn(smax_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_umin_vec: +tcg_out32(s, encode_vdvjvk_insn(umin_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_umax_vec: +tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1750,6 +1774,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_nor_vec: case INDEX_op_neg_vec: case INDEX_op_mul_vec: +case INDEX_op_smin_vec: +case INDEX_op_smax_vec: +case INDEX_op_umin_vec: +case INDEX_op_umax_vec: return 1; default: return 0; @@ -1921,6 +1949,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_xor_vec: case INDEX_op_nor_vec: case INDEX_op_mul_vec: +case INDEX_op_smin_vec: +case INDEX_op_smax_vec: +case INDEX_op_umin_vec: +case INDEX_op_umax_vec: return C_O1_I2(w, w, w); case INDEX_op_neg_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 6b97abcb5b..97af7f8631 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -190,7 +190,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_rots_vec 0 #define TCG_TARGET_HAS_rotv_vec 0 #define TCG_TARGET_HAS_sat_vec 0 -#define TCG_TARGET_HAS_minmax_vec 0 +#define TCG_TARGET_HAS_minmax_vec 1 #define TCG_TARGET_HAS_bitsel_vec 0 #define TCG_TARGET_HAS_cmpsel_vec 0 -- 2.42.0
[PATCH 03/11] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt
Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target-con-set.h | 1 + tcg/loongarch64/tcg-target.c.inc | 25 + 2 files changed, 26 insertions(+) diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h index 37b3f80bf9..e80fc7f3f7 100644 --- a/tcg/loongarch64/tcg-target-con-set.h +++ b/tcg/loongarch64/tcg-target-con-set.h @@ -31,4 +31,5 @@ C_O1_I2(r, 0, rZ) C_O1_I2(r, rZ, ri) C_O1_I2(r, rZ, rJ) C_O1_I2(r, rZ, rZ) +C_O1_I2(w, w, w) C_O1_I4(r, rZ, rJ, rZ, rZ) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 0f9427572c..cc80e5fa20 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1624,6 +1624,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, TCGReg temp = TCG_REG_TMP0; int32_t offset; +static const LoongArchInsn cmp_vec_insn[16][4] = { +[TCG_COND_EQ] = {OPC_VSEQ_B, OPC_VSEQ_H, OPC_VSEQ_W, OPC_VSEQ_D}, +[TCG_COND_LE] = {OPC_VSLE_B, OPC_VSLE_H, OPC_VSLE_W, OPC_VSLE_D}, +[TCG_COND_LEU] = {OPC_VSLE_BU, OPC_VSLE_HU, OPC_VSLE_WU, OPC_VSLE_DU}, +[TCG_COND_LT] = {OPC_VSLT_B, OPC_VSLT_H, OPC_VSLT_W, OPC_VSLT_D}, +[TCG_COND_LTU] = {OPC_VSLT_BU, OPC_VSLT_HU, OPC_VSLT_WU, OPC_VSLT_DU}, +}; +LoongArchInsn insn; + a0 = args[0]; a1 = args[1]; a2 = args[2]; @@ -1656,6 +1665,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, } tcg_out_opc_vld(s, a0, base, offset); break; +case INDEX_op_cmp_vec: +TCGCond cond = args[3]; +insn = cmp_vec_insn[cond][vece]; +if (insn == 0) { +TCGArg t; +t = a1, a1 = a2, a2 = t; +cond = tcg_swap_cond(cond); +insn = cmp_vec_insn[cond][vece]; +tcg_debug_assert(insn != 0); +} +tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1671,6 +1692,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_st_vec: case INDEX_op_dup_vec: case INDEX_op_dupm_vec: +case INDEX_op_cmp_vec: return 1; default: return 0; @@ -1832,6 +1854,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_st_vec: return C_O0_I2(w, r); +case INDEX_op_cmp_vec: +return C_O1_I2(w, w, w); + default: g_assert_not_reached(); } -- 2.42.0
[PATCH 07/11] tcg/loongarch64: Lower mul_vec to vmul
Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target.c.inc | 8 tcg/loongarch64/tcg-target.h | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index 819dcdba77..bca24b6a20 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1641,6 +1641,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static const LoongArchInsn neg_vec_insn[4] = { OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D }; +static const LoongArchInsn mul_vec_insn[4] = { +OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D +}; a0 = args[0]; a1 = args[1]; @@ -1718,6 +1721,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_neg_vec: tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1)); break; +case INDEX_op_mul_vec: +tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1743,6 +1749,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_xor_vec: case INDEX_op_nor_vec: case INDEX_op_neg_vec: +case INDEX_op_mul_vec: return 1; default: return 0; @@ -1913,6 +1920,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_orc_vec: case INDEX_op_xor_vec: case INDEX_op_nor_vec: +case INDEX_op_mul_vec: return C_O1_I2(w, w, w); case INDEX_op_neg_vec: diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h index 6a8147875a..6b97abcb5b 100644 --- a/tcg/loongarch64/tcg-target.h +++ b/tcg/loongarch64/tcg-target.h @@ -182,7 +182,7 @@ extern bool use_lsx_instructions; #define TCG_TARGET_HAS_nand_vec 0 #define TCG_TARGET_HAS_nor_vec 1 #define TCG_TARGET_HAS_eqv_vec 0 -#define TCG_TARGET_HAS_mul_vec 0 +#define TCG_TARGET_HAS_mul_vec 1 #define TCG_TARGET_HAS_shi_vec 0 #define TCG_TARGET_HAS_shs_vec 0 #define TCG_TARGET_HAS_shv_vec 0 -- 2.42.0
[PATCH 00/11] Lower TCG vector ops to LSX
This patch series allows qemu to utilize LSX instructions on LoongArch machines to execute TCG vector ops. Jiajie Chen (11): tcg/loongarch64: Import LSX instructions tcg/loongarch64: Lower basic tcg vec ops to LSX tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt tcg/loongarch64: Lower add/sub_vec to vadd/vsub tcg/loongarch64: Lower vector bitwise operations tcg/loongarch64: Lower neg_vec to vneg tcg/loongarch64: Lower mul_vec to vmul tcg/loongarch64: Lower vector min max ops tcg/loongarch64: Lower vector saturated ops tcg/loongarch64: Lower vector shift vector ops tcg/loongarch64: Lower bitsel_vec to vbitsel tcg/loongarch64/tcg-insn-defs.c.inc | 6251 +- tcg/loongarch64/tcg-target-con-set.h |5 + tcg/loongarch64/tcg-target-con-str.h |1 + tcg/loongarch64/tcg-target.c.inc | 414 +- tcg/loongarch64/tcg-target.h | 37 +- tcg/loongarch64/tcg-target.opc.h | 12 + 6 files changed, 6601 insertions(+), 119 deletions(-) create mode 100644 tcg/loongarch64/tcg-target.opc.h -- 2.42.0
[PATCH 04/11] tcg/loongarch64: Lower add/sub_vec to vadd/vsub
Lower the following ops: - add_vec - sub_vec Signed-off-by: Jiajie Chen --- tcg/loongarch64/tcg-target.c.inc | 16 1 file changed, 16 insertions(+) diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc index cc80e5fa20..eb340a6493 100644 --- a/tcg/loongarch64/tcg-target.c.inc +++ b/tcg/loongarch64/tcg-target.c.inc @@ -1632,6 +1632,12 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, [TCG_COND_LTU] = {OPC_VSLT_BU, OPC_VSLT_HU, OPC_VSLT_WU, OPC_VSLT_DU}, }; LoongArchInsn insn; +static const LoongArchInsn add_vec_insn[4] = { +OPC_VADD_B, OPC_VADD_H, OPC_VADD_W, OPC_VADD_D +}; +static const LoongArchInsn sub_vec_insn[4] = { +OPC_VSUB_B, OPC_VSUB_H, OPC_VSUB_W, OPC_VSUB_D +}; a0 = args[0]; a1 = args[1]; @@ -1677,6 +1683,12 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, } tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2)); break; +case INDEX_op_add_vec: +tcg_out32(s, encode_vdvjvk_insn(add_vec_insn[vece], a0, a1, a2)); +break; +case INDEX_op_sub_vec: +tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2)); +break; case INDEX_op_dupm_vec: tcg_out_dupm_vec(s, type, vece, a0, a1, a2); break; @@ -1693,6 +1705,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_dup_vec: case INDEX_op_dupm_vec: case INDEX_op_cmp_vec: +case INDEX_op_add_vec: +case INDEX_op_sub_vec: return 1; default: return 0; @@ -1855,6 +1869,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) return C_O0_I2(w, r); case INDEX_op_cmp_vec: +case INDEX_op_add_vec: +case INDEX_op_sub_vec: return C_O1_I2(w, w, w); default: -- 2.42.0
Re: [PATCH] hw/loongarch: Fix ACPI processor id off-by-one error
On 2023/8/21 09:24, bibo mao wrote: + Add xianglai Good catch. In theory, it is logical id, and it can be not equal to physical id. However it must be equal to _UID in cpu dsdt table which is missing now. Yes, the logical id can be different from index. The spec says: If the processor structure represents an actual processor, this field must match the value of ACPI processor ID field in the processor’s entry in the MADT. If the processor structure represents a group of associated processors, the structure might match a processor container in the name space. In that case this entry will match the value of the _UID method of the associated processor container. Where there is a match it must be represented. The flags field, described in/Processor Structure Flags/, includes a bit to describe whether the ACPI processor ID is valid. I believe PPTT, MADT and DSDT should all adhere to the same logical id mapping. Can pptt table parse error be fixed if cpu dsdt table is added? Regards Bibo Mao 在 2023/8/20 18:56, Jiajie Chen 写道: In hw/acpi/aml-build.c:build_pptt() function, the code assumes that the ACPI processor id equals to the cpu index, for example if we have 8 cpus, then the ACPI processor id should be in range 0-7. However, in hw/loongarch/acpi-build.c:build_madt() function we broke the assumption. If we have 8 cpus again, the ACPI processor id in MADT table would be in range 1-8. It violates the following description taken from ACPI spec 6.4 table 5.138: If the processor structure represents an actual processor, this field must match the value of ACPI processor ID field in the processor’s entry in the MADT. It will break the latest Linux 6.5-rc6 with the following error message: ACPI PPTT: PPTT table found, but unable to locate core 7 (8) Invalid BIOS PPTT Here 7 is the last cpu index, 8 is the ACPI processor id learned from MADT. With this patch, Linux can properly detect SMT threads when "-smp 8,sockets=1,cores=4,threads=2" is passed: Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 2 The detection of number of sockets is still wrong, but that is out of scope of the commit. Signed-off-by: Jiajie Chen --- hw/loongarch/acpi-build.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c index 0b62c3a2f7..ae292fc543 100644 --- a/hw/loongarch/acpi-build.c +++ b/hw/loongarch/acpi-build.c @@ -127,7 +127,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, LoongArchMachineState *lams) build_append_int_noprefix(table_data, 17, 1);/* Type */ build_append_int_noprefix(table_data, 15, 1);/* Length */ build_append_int_noprefix(table_data, 1, 1); /* Version */ -build_append_int_noprefix(table_data, i + 1, 4); /* ACPI Processor ID */ +build_append_int_noprefix(table_data, i, 4); /* ACPI Processor ID */ build_append_int_noprefix(table_data, arch_id, 4); /* Core ID */ build_append_int_noprefix(table_data, 1, 4); /* Flags */ }
[PATCH] hw/loongarch: Fix ACPI processor id off-by-one error
In hw/acpi/aml-build.c:build_pptt() function, the code assumes that the ACPI processor id equals to the cpu index, for example if we have 8 cpus, then the ACPI processor id should be in range 0-7. However, in hw/loongarch/acpi-build.c:build_madt() function we broke the assumption. If we have 8 cpus again, the ACPI processor id in MADT table would be in range 1-8. It violates the following description taken from ACPI spec 6.4 table 5.138: If the processor structure represents an actual processor, this field must match the value of ACPI processor ID field in the processor’s entry in the MADT. It will break the latest Linux 6.5-rc6 with the following error message: ACPI PPTT: PPTT table found, but unable to locate core 7 (8) Invalid BIOS PPTT Here 7 is the last cpu index, 8 is the ACPI processor id learned from MADT. With this patch, Linux can properly detect SMT threads when "-smp 8,sockets=1,cores=4,threads=2" is passed: Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 2 The detection of number of sockets is still wrong, but that is out of scope of the commit. Signed-off-by: Jiajie Chen --- hw/loongarch/acpi-build.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c index 0b62c3a2f7..ae292fc543 100644 --- a/hw/loongarch/acpi-build.c +++ b/hw/loongarch/acpi-build.c @@ -127,7 +127,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, LoongArchMachineState *lams) build_append_int_noprefix(table_data, 17, 1);/* Type */ build_append_int_noprefix(table_data, 15, 1);/* Length */ build_append_int_noprefix(table_data, 1, 1); /* Version */ -build_append_int_noprefix(table_data, i + 1, 4); /* ACPI Processor ID */ +build_append_int_noprefix(table_data, i, 4); /* ACPI Processor ID */ build_append_int_noprefix(table_data, arch_id, 4); /* Core ID */ build_append_int_noprefix(table_data, 1, 4); /* Flags */ } -- 2.41.0
Re: [PATCH] roms: Support compile the efi bios for loongarch
On 2023/8/10 15:42, xianglai li wrote: 1.Add edk2-platform submodule 2.Added loongarch UEFI BIOS support to compiled scripts. 3.The cross-compilation toolchain on x86 can be obtained from the link below: https://github.com/loongson/build-tools/tree/2022.09.06 Cc: Paolo Bonzini Cc: "Marc-André Lureau" Cc: "Daniel P. Berrangé" Cc: Thomas Huth Cc: "Philippe Mathieu-Daudé" Cc: Gerd Hoffmann Cc: Xiaojuan Yang Cc: Song Gao Cc: Bibo Mao Signed-off-by: xianglai li --- .gitmodules| 3 +++ meson.build| 2 +- pc-bios/meson.build| 2 ++ roms/edk2-build.config | 14 ++ roms/edk2-build.py | 4 ++-- roms/edk2-platforms| 1 + 6 files changed, 23 insertions(+), 3 deletions(-) create mode 16 roms/edk2-platforms diff --git a/.gitmodules b/.gitmodules index 73cae4cd4d..0cb57123fa 100644 --- a/.gitmodules +++ b/.gitmodules @@ -43,3 +43,6 @@ [submodule "tests/lcitool/libvirt-ci"] path = tests/lcitool/libvirt-ci url = https://gitlab.com/libvirt/libvirt-ci.git +[submodule "roms/edk2-platforms"] + path = roms/edk2-platforms + url = https://github.com/tianocore/edk2-platforms.git diff --git a/meson.build b/meson.build index 98e68ef0b1..b398caf2ce 100644 --- a/meson.build +++ b/meson.build @@ -153,7 +153,7 @@ if targetos != 'darwin' modular_tcg = ['i386-softmmu', 'x86_64-softmmu'] endif -edk2_targets = [ 'arm-softmmu', 'aarch64-softmmu', 'i386-softmmu', 'x86_64-softmmu' ] +edk2_targets = [ 'arm-softmmu', 'aarch64-softmmu', 'i386-softmmu', 'x86_64-softmmu', 'loongarch64-softmmu' ] unpack_edk2_blobs = false foreach target : edk2_targets if target in target_dirs diff --git a/pc-bios/meson.build b/pc-bios/meson.build index a7224ef469..fc73222b6c 100644 --- a/pc-bios/meson.build +++ b/pc-bios/meson.build @@ -9,6 +9,8 @@ if unpack_edk2_blobs 'edk2-i386-vars.fd', 'edk2-x86_64-code.fd', 'edk2-x86_64-secure-code.fd', +'edk2-loongarch64-code.fd', +'edk2-loongarch64-vars.fd', ] foreach f : fds diff --git a/roms/edk2-build.config b/roms/edk2-build.config index 66ef9ffcb9..7960c4c2c5 100644 --- a/roms/edk2-build.config +++ b/roms/edk2-build.config @@ -1,5 +1,6 @@ [global] core = edk2 +pkgs = edk2-platforms # options @@ -122,3 +123,16 @@ plat = RiscVVirtQemu dest = ../pc-bios cpy1 = FV/RISCV_VIRT.fd edk2-riscv.fd pad1 = edk2-riscv.fd 32m + + +# LoongArch64 + +[build.loongach64.qemu] typo: s/loongach64/loongarch64/ +conf = Platform/Loongson/LoongArchQemuPkg/Loongson.dsc +arch = LOONGARCH64 +plat = LoongArchQemu +dest = ../pc-bios +cpy1 = FV/QEMU_EFI.fd edk2-loongarch64-code.fd +pad1 = edk2-loongarch64-code.fd 4m +cpy2 = FV/QEMU_VARS.fd edk2-loongarch64-vars.fd +pad2 = edk2-loongarch64-vars.fd 16m diff --git a/roms/edk2-build.py b/roms/edk2-build.py index 870893f7c8..dbd641e51e 100755 --- a/roms/edk2-build.py +++ b/roms/edk2-build.py @@ -269,8 +269,8 @@ def prepare_env(cfg): # for cross builds if binary_exists('arm-linux-gnu-gcc'): os.environ['GCC5_ARM_PREFIX'] = 'arm-linux-gnu-' -if binary_exists('loongarch64-linux-gnu-gcc'): -os.environ['GCC5_LOONGARCH64_PREFIX'] = 'loongarch64-linux-gnu-' +if binary_exists('loongarch64-unknown-linux-gnu-gcc'): +os.environ['GCC5_LOONGARCH64_PREFIX'] = 'loongarch64-unknown-linux-gnu-' hostarch = os.uname().machine if binary_exists('aarch64-linux-gnu-gcc') and hostarch != 'aarch64': diff --git a/roms/edk2-platforms b/roms/edk2-platforms new file mode 16 index 00..84ccada592 --- /dev/null +++ b/roms/edk2-platforms @@ -0,0 +1 @@ +Subproject commit 84ccada59257a8151a592a416017fbb03b8ed3cf
[PATCH v5 09/11] target/loongarch: Truncate high 32 bits of address in VA32 mode
When running in VA32 mode(!LA64 or VA32L[1-3] matching PLV), virtual address is truncated to 32 bits before address mapping. Signed-off-by: Jiajie Chen Co-authored-by: Richard Henderson --- target/loongarch/cpu.c| 16 target/loongarch/cpu.h| 9 + target/loongarch/gdbstub.c| 2 +- .../loongarch/insn_trans/trans_atomic.c.inc | 5 ++- .../loongarch/insn_trans/trans_branch.c.inc | 3 +- .../loongarch/insn_trans/trans_fmemory.c.inc | 30 --- target/loongarch/insn_trans/trans_lsx.c.inc | 38 +-- .../loongarch/insn_trans/trans_memory.c.inc | 34 + target/loongarch/op_helper.c | 4 +- target/loongarch/translate.c | 32 10 files changed, 85 insertions(+), 88 deletions(-) diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c index 30dd70571a..bd980790f2 100644 --- a/target/loongarch/cpu.c +++ b/target/loongarch/cpu.c @@ -81,7 +81,7 @@ static void loongarch_cpu_set_pc(CPUState *cs, vaddr value) LoongArchCPU *cpu = LOONGARCH_CPU(cs); CPULoongArchState *env = >env; -env->pc = value; +set_pc(env, value); } static vaddr loongarch_cpu_get_pc(CPUState *cs) @@ -168,7 +168,7 @@ static void loongarch_cpu_do_interrupt(CPUState *cs) set_DERA: env->CSR_DERA = env->pc; env->CSR_DBG = FIELD_DP64(env->CSR_DBG, CSR_DBG, DST, 1); -env->pc = env->CSR_EENTRY + 0x480; +set_pc(env, env->CSR_EENTRY + 0x480); break; case EXCCODE_INT: if (FIELD_EX64(env->CSR_DBG, CSR_DBG, DST)) { @@ -249,7 +249,8 @@ static void loongarch_cpu_do_interrupt(CPUState *cs) /* Find the highest-priority interrupt. */ vector = 31 - clz32(pending); -env->pc = env->CSR_EENTRY + (EXCCODE_EXTERNAL_INT + vector) * vec_size; +set_pc(env, env->CSR_EENTRY + \ + (EXCCODE_EXTERNAL_INT + vector) * vec_size); qemu_log_mask(CPU_LOG_INT, "%s: PC " TARGET_FMT_lx " ERA " TARGET_FMT_lx " cause %d\n" "A " TARGET_FMT_lx " D " @@ -260,10 +261,9 @@ static void loongarch_cpu_do_interrupt(CPUState *cs) env->CSR_ECFG, env->CSR_ESTAT); } else { if (tlbfill) { -env->pc = env->CSR_TLBRENTRY; +set_pc(env, env->CSR_TLBRENTRY); } else { -env->pc = env->CSR_EENTRY; -env->pc += EXCODE_MCODE(cause) * vec_size; +set_pc(env, env->CSR_EENTRY + EXCODE_MCODE(cause) * vec_size); } qemu_log_mask(CPU_LOG_INT, "%s: PC " TARGET_FMT_lx " ERA " TARGET_FMT_lx @@ -324,7 +324,7 @@ static void loongarch_cpu_synchronize_from_tb(CPUState *cs, CPULoongArchState *env = >env; tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL)); -env->pc = tb->pc; +set_pc(env, tb->pc); } static void loongarch_restore_state_to_opc(CPUState *cs, @@ -334,7 +334,7 @@ static void loongarch_restore_state_to_opc(CPUState *cs, LoongArchCPU *cpu = LOONGARCH_CPU(cs); CPULoongArchState *env = >env; -env->pc = data[0]; +set_pc(env, data[0]); } #endif /* CONFIG_TCG */ diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h index 0e02257f91..9f550793ca 100644 --- a/target/loongarch/cpu.h +++ b/target/loongarch/cpu.h @@ -442,6 +442,15 @@ static inline bool is_va32(CPULoongArchState *env) return va32; } +static inline void set_pc(CPULoongArchState *env, uint64_t value) +{ +if (is_va32(env)) { +env->pc = (uint32_t)value; +} else { +env->pc = value; +} +} + /* * LoongArch CPUs hardware flags. */ diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c index a462e25737..e20b20f99b 100644 --- a/target/loongarch/gdbstub.c +++ b/target/loongarch/gdbstub.c @@ -77,7 +77,7 @@ int loongarch_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n) env->gpr[n] = tmp; length = read_length; } else if (n == 33) { -env->pc = tmp; +set_pc(env, tmp); length = read_length; } return length; diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc b/target/loongarch/insn_trans/trans_atomic.c.inc index c69f31bc78..d90312729b 100644 --- a/target/loongarch/insn_trans/trans_atomic.c.inc +++ b/target/loongarch/insn_trans/trans_atomic.c.inc @@ -7,9 +7,8 @@ static bool gen_ll(DisasContext *ctx, arg_rr_i *a, MemOp mop) { TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE); TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE); -TCGv t0 = tcg_temp_new(); +TCGv t0 = make_address_i(ctx, src1, a->imm); -tcg_gen_addi_tl(t0, src1, a->imm); tcg_gen_qemu_ld_i64(dest, t0, ctx->mem_idx, mop);
[PATCH v5 10/11] target/loongarch: Sign extend results in VA32 mode
In VA32 mode, BL, JIRL and PC* instructions should sign-extend the low 32 bit result to 64 bits. Signed-off-by: Jiajie Chen --- target/loongarch/insn_trans/trans_arith.c.inc | 2 +- target/loongarch/insn_trans/trans_branch.c.inc | 4 ++-- target/loongarch/translate.c | 8 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/target/loongarch/insn_trans/trans_arith.c.inc b/target/loongarch/insn_trans/trans_arith.c.inc index 4c21d8b037..e3b7979e15 100644 --- a/target/loongarch/insn_trans/trans_arith.c.inc +++ b/target/loongarch/insn_trans/trans_arith.c.inc @@ -72,7 +72,7 @@ static bool gen_pc(DisasContext *ctx, arg_r_i *a, target_ulong (*func)(target_ulong, int)) { TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE); -target_ulong addr = func(ctx->base.pc_next, a->imm); +target_ulong addr = make_address_pc(ctx, func(ctx->base.pc_next, a->imm)); tcg_gen_movi_tl(dest, addr); gen_set_gpr(a->rd, dest, EXT_NONE); diff --git a/target/loongarch/insn_trans/trans_branch.c.inc b/target/loongarch/insn_trans/trans_branch.c.inc index b63058235d..cf035e44ff 100644 --- a/target/loongarch/insn_trans/trans_branch.c.inc +++ b/target/loongarch/insn_trans/trans_branch.c.inc @@ -12,7 +12,7 @@ static bool trans_b(DisasContext *ctx, arg_b *a) static bool trans_bl(DisasContext *ctx, arg_bl *a) { -tcg_gen_movi_tl(cpu_gpr[1], ctx->base.pc_next + 4); +tcg_gen_movi_tl(cpu_gpr[1], make_address_pc(ctx, ctx->base.pc_next + 4)); gen_goto_tb(ctx, 0, ctx->base.pc_next + a->offs); ctx->base.is_jmp = DISAS_NORETURN; return true; @@ -25,7 +25,7 @@ static bool trans_jirl(DisasContext *ctx, arg_jirl *a) TCGv addr = make_address_i(ctx, src1, a->imm); tcg_gen_mov_tl(cpu_pc, addr); -tcg_gen_movi_tl(dest, ctx->base.pc_next + 4); +tcg_gen_movi_tl(dest, make_address_pc(ctx, ctx->base.pc_next + 4)); gen_set_gpr(a->rd, dest, EXT_NONE); tcg_gen_lookup_and_goto_ptr(); ctx->base.is_jmp = DISAS_NORETURN; diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c index 689da19ed0..de7c1c5d1f 100644 --- a/target/loongarch/translate.c +++ b/target/loongarch/translate.c @@ -236,6 +236,14 @@ static TCGv make_address_i(DisasContext *ctx, TCGv base, target_long ofs) return make_address_x(ctx, base, addend); } +static uint64_t make_address_pc(DisasContext *ctx, uint64_t addr) +{ +if (ctx->va32) { +addr = (int32_t)addr; +} +return addr; +} + #include "decode-insns.c.inc" #include "insn_trans/trans_arith.c.inc" #include "insn_trans/trans_shift.c.inc" -- 2.41.0
[PATCH v5 04/11] target/loongarch: Support LoongArch32 TLB entry
The TLB entry of LA32 lacks NR, NX and RPLV and they are hardwired to zero in LoongArch32. Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- target/loongarch/cpu-csr.h| 9 + target/loongarch/tlb_helper.c | 17 - 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/target/loongarch/cpu-csr.h b/target/loongarch/cpu-csr.h index f8f24032cb..48ed2e0632 100644 --- a/target/loongarch/cpu-csr.h +++ b/target/loongarch/cpu-csr.h @@ -66,10 +66,11 @@ FIELD(TLBENTRY, D, 1, 1) FIELD(TLBENTRY, PLV, 2, 2) FIELD(TLBENTRY, MAT, 4, 2) FIELD(TLBENTRY, G, 6, 1) -FIELD(TLBENTRY, PPN, 12, 36) -FIELD(TLBENTRY, NR, 61, 1) -FIELD(TLBENTRY, NX, 62, 1) -FIELD(TLBENTRY, RPLV, 63, 1) +FIELD(TLBENTRY_32, PPN, 8, 24) +FIELD(TLBENTRY_64, PPN, 12, 36) +FIELD(TLBENTRY_64, NR, 61, 1) +FIELD(TLBENTRY_64, NX, 62, 1) +FIELD(TLBENTRY_64, RPLV, 63, 1) #define LOONGARCH_CSR_ASID 0x18 /* Address space identifier */ FIELD(CSR_ASID, ASID, 0, 10) diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c index 6e00190547..cef10e2257 100644 --- a/target/loongarch/tlb_helper.c +++ b/target/loongarch/tlb_helper.c @@ -48,10 +48,17 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, hwaddr *physical, tlb_v = FIELD_EX64(tlb_entry, TLBENTRY, V); tlb_d = FIELD_EX64(tlb_entry, TLBENTRY, D); tlb_plv = FIELD_EX64(tlb_entry, TLBENTRY, PLV); -tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY, PPN); -tlb_nx = FIELD_EX64(tlb_entry, TLBENTRY, NX); -tlb_nr = FIELD_EX64(tlb_entry, TLBENTRY, NR); -tlb_rplv = FIELD_EX64(tlb_entry, TLBENTRY, RPLV); +if (is_la64(env)) { +tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY_64, PPN); +tlb_nx = FIELD_EX64(tlb_entry, TLBENTRY_64, NX); +tlb_nr = FIELD_EX64(tlb_entry, TLBENTRY_64, NR); +tlb_rplv = FIELD_EX64(tlb_entry, TLBENTRY_64, RPLV); +} else { +tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY_32, PPN); +tlb_nx = 0; +tlb_nr = 0; +tlb_rplv = 0; +} /* Check access rights */ if (!tlb_v) { @@ -79,7 +86,7 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, hwaddr *physical, * tlb_entry contains ppn[47:12] while 16KiB ppn is [47:15] * need adjust. */ -*physical = (tlb_ppn << R_TLBENTRY_PPN_SHIFT) | +*physical = (tlb_ppn << R_TLBENTRY_64_PPN_SHIFT) | (address & MAKE_64BIT_MASK(0, tlb_ps)); *prot = PAGE_READ; if (tlb_d) { -- 2.41.0
[PATCH v5 07/11] target/loongarch: Add LA64 & VA32 to DisasContext
Add LA64 and VA32(32-bit Virtual Address) to DisasContext to allow the translator to reject doubleword instructions in LA32 mode for example. Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- target/loongarch/cpu.h | 13 + target/loongarch/translate.c | 3 +++ target/loongarch/translate.h | 2 ++ 3 files changed, 18 insertions(+) diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h index 2af4c414b0..0e02257f91 100644 --- a/target/loongarch/cpu.h +++ b/target/loongarch/cpu.h @@ -431,6 +431,17 @@ static inline bool is_la64(CPULoongArchState *env) return FIELD_EX32(env->cpucfg[1], CPUCFG1, ARCH) == CPUCFG1_ARCH_LA64; } +static inline bool is_va32(CPULoongArchState *env) +{ +/* VA32 if !LA64 or VA32L[1-3] */ +bool va32 = !is_la64(env); +uint64_t plv = FIELD_EX64(env->CSR_CRMD, CSR_CRMD, PLV); +if (plv >= 1 && (FIELD_EX64(env->CSR_MISC, CSR_MISC, VA32) & (1 << plv))) { +va32 = true; +} +return va32; +} + /* * LoongArch CPUs hardware flags. */ @@ -438,6 +449,7 @@ static inline bool is_la64(CPULoongArchState *env) #define HW_FLAGS_CRMD_PGR_CSR_CRMD_PG_MASK /* 0x10 */ #define HW_FLAGS_EUEN_FPE 0x04 #define HW_FLAGS_EUEN_SXE 0x08 +#define HW_FLAGS_VA32 0x20 static inline void cpu_get_tb_cpu_state(CPULoongArchState *env, vaddr *pc, uint64_t *cs_base, uint32_t *flags) @@ -447,6 +459,7 @@ static inline void cpu_get_tb_cpu_state(CPULoongArchState *env, vaddr *pc, *flags = env->CSR_CRMD & (R_CSR_CRMD_PLV_MASK | R_CSR_CRMD_PG_MASK); *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, FPE) * HW_FLAGS_EUEN_FPE; *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, SXE) * HW_FLAGS_EUEN_SXE; +*flags |= is_va32(env) * HW_FLAGS_VA32; } void loongarch_cpu_list(void); diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c index 3146a2d4ac..ac847745df 100644 --- a/target/loongarch/translate.c +++ b/target/loongarch/translate.c @@ -119,6 +119,9 @@ static void loongarch_tr_init_disas_context(DisasContextBase *dcbase, ctx->vl = LSX_LEN; } +ctx->la64 = is_la64(env); +ctx->va32 = (ctx->base.tb->flags & HW_FLAGS_VA32) != 0; + ctx->zero = tcg_constant_tl(0); } diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h index 7f60090580..b6fa5df82d 100644 --- a/target/loongarch/translate.h +++ b/target/loongarch/translate.h @@ -33,6 +33,8 @@ typedef struct DisasContext { uint16_t plv; int vl; /* Vector length */ TCGv zero; +bool la64; /* LoongArch64 mode */ +bool va32; /* 32-bit virtual address */ } DisasContext; void generate_exception(DisasContext *ctx, int excp); -- 2.41.0
[PATCH v5 11/11] target/loongarch: Add loongarch32 cpu la132
Add la132 as a loongarch32 cpu type and allow virt machine to be used with la132 instead of la464. Due to lack of public documentation of la132, it is currently a synthetic loongarch32 cpu model. Details need to be added in the future. Signed-off-by: Jiajie Chen --- hw/loongarch/virt.c| 5 - target/loongarch/cpu.c | 29 + 2 files changed, 29 insertions(+), 5 deletions(-) diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c index e19b042ce8..af15bf5aaa 100644 --- a/hw/loongarch/virt.c +++ b/hw/loongarch/virt.c @@ -798,11 +798,6 @@ static void loongarch_init(MachineState *machine) cpu_model = LOONGARCH_CPU_TYPE_NAME("la464"); } -if (!strstr(cpu_model, "la464")) { -error_report("LoongArch/TCG needs cpu type la464"); -exit(1); -} - if (ram_size < 1 * GiB) { error_report("ram_size must be greater than 1G."); exit(1); diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c index bd980790f2..dd1cd7d7d2 100644 --- a/target/loongarch/cpu.c +++ b/target/loongarch/cpu.c @@ -439,6 +439,34 @@ static void loongarch_la464_initfn(Object *obj) env->CSR_ASID = FIELD_DP64(0, CSR_ASID, ASIDBITS, 0xa); } +static void loongarch_la132_initfn(Object *obj) +{ +LoongArchCPU *cpu = LOONGARCH_CPU(obj); +CPULoongArchState *env = >env; + +int i; + +for (i = 0; i < 21; i++) { +env->cpucfg[i] = 0x0; +} + +cpu->dtb_compatible = "loongarch,Loongson-1C103"; + +uint32_t data = 0; +data = FIELD_DP32(data, CPUCFG1, ARCH, 1); /* LA32 */ +data = FIELD_DP32(data, CPUCFG1, PGMMU, 1); +data = FIELD_DP32(data, CPUCFG1, IOCSR, 1); +data = FIELD_DP32(data, CPUCFG1, PALEN, 0x1f); /* 32 bits */ +data = FIELD_DP32(data, CPUCFG1, VALEN, 0x1f); /* 32 bits */ +data = FIELD_DP32(data, CPUCFG1, UAL, 1); +data = FIELD_DP32(data, CPUCFG1, RI, 0); +data = FIELD_DP32(data, CPUCFG1, EP, 0); +data = FIELD_DP32(data, CPUCFG1, RPLV, 0); +data = FIELD_DP32(data, CPUCFG1, HP, 1); +data = FIELD_DP32(data, CPUCFG1, IOCSR_BRD, 1); +env->cpucfg[1] = data; +} + static void loongarch_cpu_list_entry(gpointer data, gpointer user_data) { const char *typename = object_class_get_name(OBJECT_CLASS(data)); @@ -778,6 +806,7 @@ static const TypeInfo loongarch_cpu_type_infos[] = { .class_init = loongarch32_cpu_class_init, }, DEFINE_LOONGARCH_CPU_TYPE("la464", loongarch_la464_initfn), +DEFINE_LOONGARCH32_CPU_TYPE("la132", loongarch_la132_initfn), }; DEFINE_TYPES(loongarch_cpu_type_infos) -- 2.41.0
[PATCH v5 06/11] target/loongarch: Support LoongArch32 VPPN
VPPN of TLBEHI/TLBREHI is limited to 19 bits in LA32. Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- target/loongarch/cpu-csr.h| 6 -- target/loongarch/tlb_helper.c | 23 ++- 2 files changed, 22 insertions(+), 7 deletions(-) diff --git a/target/loongarch/cpu-csr.h b/target/loongarch/cpu-csr.h index b93f99a9ef..c59d7a9fcb 100644 --- a/target/loongarch/cpu-csr.h +++ b/target/loongarch/cpu-csr.h @@ -57,7 +57,8 @@ FIELD(CSR_TLBIDX, PS, 24, 6) FIELD(CSR_TLBIDX, NE, 31, 1) #define LOONGARCH_CSR_TLBEHI 0x11 /* TLB EntryHi */ -FIELD(CSR_TLBEHI, VPPN, 13, 35) +FIELD(CSR_TLBEHI_32, VPPN, 13, 19) +FIELD(CSR_TLBEHI_64, VPPN, 13, 35) #define LOONGARCH_CSR_TLBELO00x12 /* TLB EntryLo0 */ #define LOONGARCH_CSR_TLBELO10x13 /* TLB EntryLo1 */ @@ -164,7 +165,8 @@ FIELD(CSR_TLBRERA, PC, 2, 62) #define LOONGARCH_CSR_TLBRELO1 0x8d /* TLB refill entrylo1 */ #define LOONGARCH_CSR_TLBREHI0x8e /* TLB refill entryhi */ FIELD(CSR_TLBREHI, PS, 0, 6) -FIELD(CSR_TLBREHI, VPPN, 13, 35) +FIELD(CSR_TLBREHI_32, VPPN, 13, 19) +FIELD(CSR_TLBREHI_64, VPPN, 13, 35) #define LOONGARCH_CSR_TLBRPRMD 0x8f /* TLB refill mode info */ FIELD(CSR_TLBRPRMD, PPLV, 0, 2) FIELD(CSR_TLBRPRMD, PIE, 2, 1) diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c index 1f8e7911c7..c8b8b0497f 100644 --- a/target/loongarch/tlb_helper.c +++ b/target/loongarch/tlb_helper.c @@ -300,8 +300,13 @@ static void raise_mmu_exception(CPULoongArchState *env, target_ulong address, if (tlb_error == TLBRET_NOMATCH) { env->CSR_TLBRBADV = address; -env->CSR_TLBREHI = FIELD_DP64(env->CSR_TLBREHI, CSR_TLBREHI, VPPN, - extract64(address, 13, 35)); +if (is_la64(env)) { +env->CSR_TLBREHI = FIELD_DP64(env->CSR_TLBREHI, CSR_TLBREHI_64, +VPPN, extract64(address, 13, 35)); +} else { +env->CSR_TLBREHI = FIELD_DP64(env->CSR_TLBREHI, CSR_TLBREHI_32, +VPPN, extract64(address, 13, 19)); +} } else { if (!FIELD_EX64(env->CSR_DBG, CSR_DBG, DST)) { env->CSR_BADV = address; @@ -366,12 +371,20 @@ static void fill_tlb_entry(CPULoongArchState *env, int index) if (FIELD_EX64(env->CSR_TLBRERA, CSR_TLBRERA, ISTLBR)) { csr_ps = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI, PS); -csr_vppn = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI, VPPN); +if (is_la64(env)) { +csr_vppn = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI_64, VPPN); +} else { +csr_vppn = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI_32, VPPN); +} lo0 = env->CSR_TLBRELO0; lo1 = env->CSR_TLBRELO1; } else { csr_ps = FIELD_EX64(env->CSR_TLBIDX, CSR_TLBIDX, PS); -csr_vppn = FIELD_EX64(env->CSR_TLBEHI, CSR_TLBEHI, VPPN); +if (is_la64(env)) { +csr_vppn = FIELD_EX64(env->CSR_TLBEHI, CSR_TLBEHI_64, VPPN); +} else { +csr_vppn = FIELD_EX64(env->CSR_TLBEHI, CSR_TLBEHI_32, VPPN); +} lo0 = env->CSR_TLBELO0; lo1 = env->CSR_TLBELO1; } @@ -491,7 +504,7 @@ void helper_tlbfill(CPULoongArchState *env) if (pagesize == stlb_ps) { /* Only write into STLB bits [47:13] */ -address = entryhi & ~MAKE_64BIT_MASK(0, R_CSR_TLBEHI_VPPN_SHIFT); +address = entryhi & ~MAKE_64BIT_MASK(0, R_CSR_TLBEHI_64_VPPN_SHIFT); /* Choose one set ramdomly */ set = get_random_tlb(0, 7); -- 2.41.0
[PATCH v5 05/11] target/loongarch: Support LoongArch32 DMW
LA32 uses a different encoding for CSR.DMW and a new direct mapping mechanism. Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- target/loongarch/cpu-csr.h| 7 +++ target/loongarch/tlb_helper.c | 26 +++--- 2 files changed, 26 insertions(+), 7 deletions(-) diff --git a/target/loongarch/cpu-csr.h b/target/loongarch/cpu-csr.h index 48ed2e0632..b93f99a9ef 100644 --- a/target/loongarch/cpu-csr.h +++ b/target/loongarch/cpu-csr.h @@ -188,10 +188,9 @@ FIELD(CSR_DMW, PLV1, 1, 1) FIELD(CSR_DMW, PLV2, 2, 1) FIELD(CSR_DMW, PLV3, 3, 1) FIELD(CSR_DMW, MAT, 4, 2) -FIELD(CSR_DMW, VSEG, 60, 4) - -#define dmw_va2pa(va) \ -(va & MAKE_64BIT_MASK(0, TARGET_VIRT_ADDR_SPACE_BITS)) +FIELD(CSR_DMW_32, PSEG, 25, 3) +FIELD(CSR_DMW_32, VSEG, 29, 3) +FIELD(CSR_DMW_64, VSEG, 60, 4) /* Debug CSRs */ #define LOONGARCH_CSR_DBG0x500 /* debug config */ diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c index cef10e2257..1f8e7911c7 100644 --- a/target/loongarch/tlb_helper.c +++ b/target/loongarch/tlb_helper.c @@ -173,6 +173,18 @@ static int loongarch_map_address(CPULoongArchState *env, hwaddr *physical, return TLBRET_NOMATCH; } +static hwaddr dmw_va2pa(CPULoongArchState *env, target_ulong va, +target_ulong dmw) +{ +if (is_la64(env)) { +return va & TARGET_VIRT_MASK; +} else { +uint32_t pseg = FIELD_EX32(dmw, CSR_DMW_32, PSEG); +return (va & MAKE_64BIT_MASK(0, R_CSR_DMW_32_VSEG_SHIFT)) | \ +(pseg << R_CSR_DMW_32_VSEG_SHIFT); +} +} + static int get_physical_address(CPULoongArchState *env, hwaddr *physical, int *prot, target_ulong address, MMUAccessType access_type, int mmu_idx) @@ -192,12 +204,20 @@ static int get_physical_address(CPULoongArchState *env, hwaddr *physical, } plv = kernel_mode | (user_mode << R_CSR_DMW_PLV3_SHIFT); -base_v = address >> R_CSR_DMW_VSEG_SHIFT; +if (is_la64(env)) { +base_v = address >> R_CSR_DMW_64_VSEG_SHIFT; +} else { +base_v = address >> R_CSR_DMW_32_VSEG_SHIFT; +} /* Check direct map window */ for (int i = 0; i < 4; i++) { -base_c = FIELD_EX64(env->CSR_DMW[i], CSR_DMW, VSEG); +if (is_la64(env)) { +base_c = FIELD_EX64(env->CSR_DMW[i], CSR_DMW_64, VSEG); +} else { +base_c = FIELD_EX64(env->CSR_DMW[i], CSR_DMW_32, VSEG); +} if ((plv & env->CSR_DMW[i]) && (base_c == base_v)) { -*physical = dmw_va2pa(address); +*physical = dmw_va2pa(env, address, env->CSR_DMW[i]); *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC; return TLBRET_MATCH; } -- 2.41.0
[PATCH v5 03/11] target/loongarch: Add GDB support for loongarch32 mode
GPRs and PC are 32-bit wide in loongarch32 mode. Signed-off-by: Jiajie Chen Reviewed-by: Richard Henderson --- configs/targets/loongarch64-softmmu.mak | 2 +- gdb-xml/loongarch-base32.xml| 45 + target/loongarch/cpu.c | 10 +- target/loongarch/gdbstub.c | 32 ++ 4 files changed, 80 insertions(+), 9 deletions(-) create mode 100644 gdb-xml/loongarch-base32.xml diff --git a/configs/targets/loongarch64-softmmu.mak b/configs/targets/loongarch64-softmmu.mak index 9abc99056f..f23780fdd8 100644 --- a/configs/targets/loongarch64-softmmu.mak +++ b/configs/targets/loongarch64-softmmu.mak @@ -1,5 +1,5 @@ TARGET_ARCH=loongarch64 TARGET_BASE_ARCH=loongarch TARGET_SUPPORTS_MTTCG=y -TARGET_XML_FILES= gdb-xml/loongarch-base64.xml gdb-xml/loongarch-fpu.xml +TARGET_XML_FILES= gdb-xml/loongarch-base32.xml gdb-xml/loongarch-base64.xml gdb-xml/loongarch-fpu.xml TARGET_NEED_FDT=y diff --git a/gdb-xml/loongarch-base32.xml b/gdb-xml/loongarch-base32.xml new file mode 100644 index 00..af47bbd3da --- /dev/null +++ b/gdb-xml/loongarch-base32.xml @@ -0,0 +1,45 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c index c6b73444b4..30dd70571a 100644 --- a/target/loongarch/cpu.c +++ b/target/loongarch/cpu.c @@ -694,7 +694,13 @@ static const struct SysemuCPUOps loongarch_sysemu_ops = { static gchar *loongarch_gdb_arch_name(CPUState *cs) { -return g_strdup("loongarch64"); +LoongArchCPU *cpu = LOONGARCH_CPU(cs); +CPULoongArchState *env = >env; +if (is_la64(env)) { +return g_strdup("loongarch64"); +} else { +return g_strdup("loongarch32"); +} } static void loongarch_cpu_class_init(ObjectClass *c, void *data) @@ -734,6 +740,8 @@ static void loongarch_cpu_class_init(ObjectClass *c, void *data) static void loongarch32_cpu_class_init(ObjectClass *c, void *data) { +CPUClass *cc = CPU_CLASS(c); +cc->gdb_core_xml_file = "loongarch-base32.xml"; } #define DEFINE_LOONGARCH_CPU_TYPE(model, initfn) \ diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c index 0752fff924..a462e25737 100644 --- a/target/loongarch/gdbstub.c +++ b/target/loongarch/gdbstub.c @@ -34,16 +34,25 @@ int loongarch_cpu_gdb_read_register(CPUState *cs, GByteArray *mem_buf, int n) { LoongArchCPU *cpu = LOONGARCH_CPU(cs); CPULoongArchState *env = >env; +uint64_t val; if (0 <= n && n < 32) { -return gdb_get_regl(mem_buf, env->gpr[n]); +val = env->gpr[n]; } else if (n == 32) { /* orig_a0 */ -return gdb_get_regl(mem_buf, 0); +val = 0; } else if (n == 33) { -return gdb_get_regl(mem_buf, env->pc); +val = env->pc; } else if (n == 34) { -return gdb_get_regl(mem_buf, env->CSR_BADV); +val = env->CSR_BADV; +} + +if (0 <= n && n <= 34) { +if (is_la64(env)) { +return gdb_get_reg64(mem_buf, val); +} else { +return gdb_get_reg32(mem_buf, val); +} } return 0; } @@ -52,15 +61,24 @@ int loongarch_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n) { LoongArchCPU *cpu = LOONGARCH_CPU(cs); CPULoongArchState *env = >env; -target_ulong tmp = ldtul_p(mem_buf); +target_ulong tmp; +int read_length; int length = 0; +if (is_la64(env)) { +tmp = ldq_p(mem_buf); +read_length = 8; +} else { +tmp = ldl_p(mem_buf); +read_length = 4; +} + if (0 <= n && n < 32) { env->gpr[n] = tmp; -length = sizeof(target_ulong); +length = read_length; } else if (n == 33) { env->pc = tmp; -length = sizeof(target_ulong); +length = read_length; } return length; } -- 2.41.0
[PATCH v5 02/11] target/loongarch: Add new object class for loongarch32 cpus
Add object class for future loongarch32 cpus. It is derived from the loongarch64 object class. Signed-off-by: Jiajie Chen --- target/loongarch/cpu.c | 19 +++ target/loongarch/cpu.h | 1 + 2 files changed, 20 insertions(+) diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c index ad93ecac92..c6b73444b4 100644 --- a/target/loongarch/cpu.c +++ b/target/loongarch/cpu.c @@ -732,12 +732,22 @@ static void loongarch_cpu_class_init(ObjectClass *c, void *data) #endif } +static void loongarch32_cpu_class_init(ObjectClass *c, void *data) +{ +} + #define DEFINE_LOONGARCH_CPU_TYPE(model, initfn) \ { \ .parent = TYPE_LOONGARCH_CPU, \ .instance_init = initfn, \ .name = LOONGARCH_CPU_TYPE_NAME(model), \ } +#define DEFINE_LOONGARCH32_CPU_TYPE(model, initfn) \ +{ \ +.parent = TYPE_LOONGARCH32_CPU, \ +.instance_init = initfn, \ +.name = LOONGARCH_CPU_TYPE_NAME(model), \ +} static const TypeInfo loongarch_cpu_type_infos[] = { { @@ -750,6 +760,15 @@ static const TypeInfo loongarch_cpu_type_infos[] = { .class_size = sizeof(LoongArchCPUClass), .class_init = loongarch_cpu_class_init, }, +{ +.name = TYPE_LOONGARCH32_CPU, +.parent = TYPE_LOONGARCH_CPU, +.instance_size = sizeof(LoongArchCPU), + +.abstract = true, +.class_size = sizeof(LoongArchCPUClass), +.class_init = loongarch32_cpu_class_init, +}, DEFINE_LOONGARCH_CPU_TYPE("la464", loongarch_la464_initfn), }; diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h index 5a71d64a04..2af4c414b0 100644 --- a/target/loongarch/cpu.h +++ b/target/loongarch/cpu.h @@ -382,6 +382,7 @@ struct ArchCPU { }; #define TYPE_LOONGARCH_CPU "loongarch-cpu" +#define TYPE_LOONGARCH32_CPU "loongarch32-cpu" OBJECT_DECLARE_CPU_TYPE(LoongArchCPU, LoongArchCPUClass, LOONGARCH_CPU) -- 2.41.0
[PATCH v5 08/11] target/loongarch: Reject la64-only instructions in la32 mode
LoongArch64-only instructions are marked with regard to the instruction manual Table 2. LSX instructions are not marked for now for lack of public manual. Signed-off-by: Jiajie Chen --- target/loongarch/insn_trans/trans_arith.c.inc | 30 .../loongarch/insn_trans/trans_atomic.c.inc | 76 +-- target/loongarch/insn_trans/trans_bit.c.inc | 28 +++ .../loongarch/insn_trans/trans_branch.c.inc | 4 +- target/loongarch/insn_trans/trans_extra.c.inc | 16 ++-- target/loongarch/insn_trans/trans_fmov.c.inc | 4 +- .../loongarch/insn_trans/trans_memory.c.inc | 68 - target/loongarch/insn_trans/trans_shift.c.inc | 14 ++-- target/loongarch/translate.h | 7 ++ 9 files changed, 127 insertions(+), 120 deletions(-) diff --git a/target/loongarch/insn_trans/trans_arith.c.inc b/target/loongarch/insn_trans/trans_arith.c.inc index 43d6cf261d..4c21d8b037 100644 --- a/target/loongarch/insn_trans/trans_arith.c.inc +++ b/target/loongarch/insn_trans/trans_arith.c.inc @@ -249,9 +249,9 @@ static bool trans_addu16i_d(DisasContext *ctx, arg_addu16i_d *a) } TRANS(add_w, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_add_tl) -TRANS(add_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_add_tl) +TRANS_64(add_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_add_tl) TRANS(sub_w, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_sub_tl) -TRANS(sub_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_sub_tl) +TRANS_64(sub_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_sub_tl) TRANS(and, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_and_tl) TRANS(or, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_or_tl) TRANS(xor, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_xor_tl) @@ -261,32 +261,32 @@ TRANS(orn, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_orc_tl) TRANS(slt, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_slt) TRANS(sltu, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_sltu) TRANS(mul_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_SIGN, tcg_gen_mul_tl) -TRANS(mul_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_mul_tl) +TRANS_64(mul_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_mul_tl) TRANS(mulh_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_NONE, gen_mulh_w) TRANS(mulh_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_NONE, gen_mulh_w) -TRANS(mulh_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_d) -TRANS(mulh_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_du) -TRANS(mulw_d_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_NONE, tcg_gen_mul_tl) -TRANS(mulw_d_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_NONE, tcg_gen_mul_tl) +TRANS_64(mulh_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_d) +TRANS_64(mulh_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_du) +TRANS_64(mulw_d_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_NONE, tcg_gen_mul_tl) +TRANS_64(mulw_d_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_NONE, tcg_gen_mul_tl) TRANS(div_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_SIGN, gen_div_w) TRANS(mod_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_SIGN, gen_rem_w) TRANS(div_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_SIGN, gen_div_du) TRANS(mod_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_SIGN, gen_rem_du) -TRANS(div_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_d) -TRANS(mod_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_d) -TRANS(div_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_du) -TRANS(mod_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_du) +TRANS_64(div_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_d) +TRANS_64(mod_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_d) +TRANS_64(div_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_du) +TRANS_64(mod_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_du) TRANS(slti, gen_rri_v, EXT_NONE, EXT_NONE, gen_slt) TRANS(sltui, gen_rri_v, EXT_NONE, EXT_NONE, gen_sltu) TRANS(addi_w, gen_rri_c, EXT_NONE, EXT_SIGN, tcg_gen_addi_tl) -TRANS(addi_d, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_addi_tl) +TRANS_64(addi_d, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_addi_tl) TRANS(alsl_w, gen_rrr_sa, EXT_NONE, EXT_SIGN, gen_alsl) -TRANS(alsl_wu, gen_rrr_sa, EXT_NONE, EXT_ZERO, gen_alsl) -TRANS(alsl_d, gen_rrr_sa, EXT_NONE, EXT_NONE, gen_alsl) +TRANS_64(alsl_wu, gen_rrr_sa, EXT_NONE, EXT_ZERO, gen_alsl) +TRANS_64(alsl_d, gen_rrr_sa, EXT_NONE, EXT_NONE, gen_alsl) TRANS(pcaddi, gen_pc, gen_pcaddi) TRANS(pcalau12i, gen_pc, gen_pcalau12i) TRANS(pcaddu12i, gen_pc, gen_pcaddu12i) -TRANS(pcaddu18i, gen_pc, gen_pcaddu18i) +TRANS_64(pcaddu18i, gen_pc, gen_pcaddu18i) TRANS(andi, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_andi_tl) TRANS(ori, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_ori_tl) TRANS(xori, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_xori_tl) diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc b/target/loongarch/insn_trans/trans_atomic.c.inc index 612709f2a7..c69f31bc78 100644 --- a/target/loongarch/insn_trans/trans_atomic.c.inc +++ b/target/loongarch/insn_trans/trans_atomic.c.inc @@ -70,41 +70,41 @@ static bool gen_am(DisasContext *ctx, arg_rrr *a, TRANS(ll_w, gen_ll, MO_TESL) TRANS
[PATCH v5 00/11] Add la32 & va32 support for loongarch64-softmmu
This patch series allow qemu-system-loongarch64 to emulate a LoongArch32 machine. A new CPU model (la132) is added for loongarch32, however due to lack of public documentation, details will need to be added in the future. Initial GDB support is added. At the same time, VA32(32-bit virtual address) support is introduced for LoongArch64. LA32 support is tested using a small supervisor program at https://github.com/jiegec/supervisor-la32. VA32 mode under LA64 is not tested yet. Changes since v4: - Code refactor, thanks Richard Henderson for great advice - Truncate higher 32 bits of PC in VA32 mode - Revert la132 initfn refactor Changes since v3: - Support VA32 mode for LoongArch64 - Check the current arch from CPUCFG.ARCH - Reject la64-only instructions in la32 mode Changes since v2: - Fix typo in previous commit - Fix VPPN width in TLBEHI/TLBREHI Changes since v1: - No longer create a separate qemu-system-loongarch32 executable, but allow user to run loongarch32 emulation using qemu-system-loongarch64 - Add loongarch32 cpu support for virt machine Full changes: Jiajie Chen (11): target/loongarch: Add function to check current arch target/loongarch: Add new object class for loongarch32 cpus target/loongarch: Add GDB support for loongarch32 mode target/loongarch: Support LoongArch32 TLB entry target/loongarch: Support LoongArch32 DMW target/loongarch: Support LoongArch32 VPPN target/loongarch: Add LA64 & VA32 to DisasContext target/loongarch: Reject la64-only instructions in la32 mode target/loongarch: Truncate high 32 bits of address in VA32 mode target/loongarch: Sign extend results in VA32 mode target/loongarch: Add loongarch32 cpu la132 configs/targets/loongarch64-softmmu.mak | 2 +- gdb-xml/loongarch-base32.xml | 45 hw/loongarch/virt.c | 5 - target/loongarch/cpu-csr.h| 22 ++-- target/loongarch/cpu.c| 74 +++-- target/loongarch/cpu.h| 33 ++ target/loongarch/gdbstub.c| 34 -- target/loongarch/insn_trans/trans_arith.c.inc | 32 +++--- .../loongarch/insn_trans/trans_atomic.c.inc | 81 +++--- target/loongarch/insn_trans/trans_bit.c.inc | 28 ++--- .../loongarch/insn_trans/trans_branch.c.inc | 11 +- target/loongarch/insn_trans/trans_extra.c.inc | 16 +-- .../loongarch/insn_trans/trans_fmemory.c.inc | 30 ++ target/loongarch/insn_trans/trans_fmov.c.inc | 4 +- target/loongarch/insn_trans/trans_lsx.c.inc | 38 ++- .../loongarch/insn_trans/trans_memory.c.inc | 102 -- target/loongarch/insn_trans/trans_shift.c.inc | 14 +-- target/loongarch/op_helper.c | 4 +- target/loongarch/tlb_helper.c | 66 +--- target/loongarch/translate.c | 43 target/loongarch/translate.h | 9 ++ 21 files changed, 445 insertions(+), 248 deletions(-) create mode 100644 gdb-xml/loongarch-base32.xml -- 2.41.0
[PATCH v5 01/11] target/loongarch: Add function to check current arch
Add is_la64 function to check if the current cpucfg[1].arch equals to 2(LA64). Signed-off-by: Jiajie Chen Co-authored-by: Richard Henderson Reviewed-by: Richard Henderson --- target/loongarch/cpu.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h index fa371ca8ba..5a71d64a04 100644 --- a/target/loongarch/cpu.h +++ b/target/loongarch/cpu.h @@ -132,6 +132,11 @@ FIELD(CPUCFG1, HP, 24, 1) FIELD(CPUCFG1, IOCSR_BRD, 25, 1) FIELD(CPUCFG1, MSG_INT, 26, 1) +/* cpucfg[1].arch */ +#define CPUCFG1_ARCH_LA32R 0 +#define CPUCFG1_ARCH_LA321 +#define CPUCFG1_ARCH_LA642 + /* cpucfg[2] bits */ FIELD(CPUCFG2, FP, 0, 1) FIELD(CPUCFG2, FP_SP, 1, 1) @@ -420,6 +425,11 @@ static inline int cpu_mmu_index(CPULoongArchState *env, bool ifetch) #endif } +static inline bool is_la64(CPULoongArchState *env) +{ +return FIELD_EX32(env->cpucfg[1], CPUCFG1, ARCH) == CPUCFG1_ARCH_LA64; +} + /* * LoongArch CPUs hardware flags. */ -- 2.41.0
Re: [PATCH v4 11/11] target/loongarch: Add loongarch32 cpu la132
On 2023/8/9 03:26, Richard Henderson wrote: On 8/7/23 18:54, Jiajie Chen wrote: +static void loongarch_la464_initfn(Object *obj) +{ + LoongArchCPU *cpu = LOONGARCH_CPU(obj); + CPULoongArchState *env = >env; + + loongarch_cpu_initfn_common(env); + + cpu->dtb_compatible = "loongarch,Loongson-3A5000"; + env->cpucfg[0] = 0x14c010; /* PRID */ + + uint32_t data = env->cpucfg[1]; + data = FIELD_DP32(data, CPUCFG1, ARCH, 2); /* LA64 */ + data = FIELD_DP32(data, CPUCFG1, PALEN, 0x2f); /* 48 bits */ + data = FIELD_DP32(data, CPUCFG1, VALEN, 0x2f); /* 48 bits */ + data = FIELD_DP32(data, CPUCFG1, RI, 1); + data = FIELD_DP32(data, CPUCFG1, EP, 1); + data = FIELD_DP32(data, CPUCFG1, RPLV, 1); + env->cpucfg[1] = data; +} + +static void loongarch_la132_initfn(Object *obj) +{ + LoongArchCPU *cpu = LOONGARCH_CPU(obj); + CPULoongArchState *env = >env; + + loongarch_cpu_initfn_common(env); + + cpu->dtb_compatible = "loongarch,Loongson-1C103"; + + uint32_t data = env->cpucfg[1]; + data = FIELD_DP32(data, CPUCFG1, ARCH, 1); /* LA32 */ + data = FIELD_DP32(data, CPUCFG1, PALEN, 0x1f); /* 32 bits */ + data = FIELD_DP32(data, CPUCFG1, VALEN, 0x1f); /* 32 bits */ + data = FIELD_DP32(data, CPUCFG1, RI, 0); + data = FIELD_DP32(data, CPUCFG1, EP, 0); + data = FIELD_DP32(data, CPUCFG1, RPLV, 0); + env->cpucfg[1] = data; +} The use of the loongarch_cpu_initfn_common function is not going to scale. Compare the set of *_initfn in target/arm/tcg/cpu32.c In general, you want to copy data in bulk from the processor manual, so that the reviewer can simply read through the table and see that the code is correct, without having to check between multiple functions to see that the combination is correct. For our existing la464, that table is Table 54 in the 3A5000 manual. Is there a public specification for the la132? I could not find one in https://www.loongson.cn/EN/product/, but perhaps that's just the english view. There seems no, even from the chinese view. r~
Re: [PATCH v4 01/11] target/loongarch: Add macro to check current arch
On 2023/8/9 01:01, Richard Henderson wrote: On 8/7/23 18:54, Jiajie Chen wrote: Add macro to check if the current cpucfg[1].arch equals to 1(LA32) or 2(LA64). Signed-off-by: Jiajie Chen --- target/loongarch/cpu.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h index fa371ca8ba..bf0da8d5b4 100644 --- a/target/loongarch/cpu.h +++ b/target/loongarch/cpu.h @@ -132,6 +132,13 @@ FIELD(CPUCFG1, HP, 24, 1) FIELD(CPUCFG1, IOCSR_BRD, 25, 1) FIELD(CPUCFG1, MSG_INT, 26, 1) +/* cpucfg[1].arch */ +#define CPUCFG1_ARCH_LA32 1 +#define CPUCFG1_ARCH_LA64 2 + +#define LOONGARCH_CPUCFG_ARCH(env, mode) \ + (FIELD_EX32(env->cpucfg[1], CPUCFG1, ARCH) == CPUCFG1_ARCH_##mode) Reviewed-by: Richard Henderson But in using this recall that 0 is a defined value for "simplified la32", so !LOONGARCH_CPUCFG_ARCH(env, LA64) may not in future equal LOONGARCH_CPUCFG_ARCH(env, LA32) it someone ever decides to implement this simplified version. (We emulate very small embedded Arm cpus, so it's not out of the question that you may want to emulate the very smallest LoongArch cpus.) Yes, actually the LoongArch 32 Reduced (or "simplified la32") version is my final aim because we are making embedded LoongArch32 Reduced CPUs on FPGA for a competition, and supporting LoongArch 32 is the first step ahead. It might be easier to just define static inline bool is_la64(CPULoongArch64 *env) { return FIELD_EX32(env->cpucfg[1], CPUCFG1, ARCH) == CPUCFG1_ARCH_LA64; } Sure, I will use this way. r~
Re: [PATCH] target/loongarch: Split fcc register to fcc0-7 in gdbstub
On 2023/8/8 17:55, Jiajie Chen wrote: On 2023/8/8 14:10, bibo mao wrote: I am not familiar with gdb, is there abi breakage? I do not know how gdb client works with gdb server with different versions. There seemed no versioning in the process, but rather in-code xml validation. In gdb, the code only allows new xml (fcc0-7) and rejects old one (fcc), so gdb breaks qemu first and do not consider backward compatibility with qemu. Not abi breakage, but gdb will complain: warning: while parsing target description (at line 1): Target description specified unknown architecture "loongarch64" warning: Could not load XML target description; ignoring warning: No executable has been specified and target does not support determining executable automatically. Try using the "file" command. Truncated register 38 in remote 'g' packet Sorry, to be clear, the actual error message is: (gdb) target extended-remote localhost:1234 Remote debugging using localhost:1234 warning: Architecture rejected target-supplied description warning: No executable has been specified and target does not support It rejects the target description xml given by qemu, thus using the builtin one. However, there is a mismatch in fcc registers, so it will not work if we list floating point registers. At the same time, if we are using loongarch32 target(I recently posted patches to support this), it will reject the target description and fallback to loongarch64, making gcc not usable. And gdb can no longer debug kernel running in qemu. You can reproduce this error using latest qemu(without this patch) and gdb(13.1 or later). Regards Bibo Mao 在 2023/8/8 13:42, Jiajie Chen 写道: Since GDB 13.1(GDB commit ea3352172), GDB LoongArch changed to use fcc0-7 instead of fcc register. This commit partially reverts commit 2f149c759 (`target/loongarch: Update gdb_set_fpu() and gdb_get_fpu()`) to match the behavior of GDB. Note that it is a breaking change for GDB 13.0 or earlier, but it is also required for GDB 13.1 or later to work. Signed-off-by: Jiajie Chen --- gdb-xml/loongarch-fpu.xml | 9 - target/loongarch/gdbstub.c | 16 +++- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/gdb-xml/loongarch-fpu.xml b/gdb-xml/loongarch-fpu.xml index 78e42cf5dd..e81e3382e7 100644 --- a/gdb-xml/loongarch-fpu.xml +++ b/gdb-xml/loongarch-fpu.xml @@ -45,6 +45,13 @@ - + + + + + + + + diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c index 0752fff924..15ad6778f1 100644 --- a/target/loongarch/gdbstub.c +++ b/target/loongarch/gdbstub.c @@ -70,10 +70,9 @@ static int loongarch_gdb_get_fpu(CPULoongArchState *env, { if (0 <= n && n < 32) { return gdb_get_reg64(mem_buf, env->fpr[n].vreg.D(0)); - } else if (n == 32) { - uint64_t val = read_fcc(env); - return gdb_get_reg64(mem_buf, val); - } else if (n == 33) { + } else if (32 <= n && n < 40) { + return gdb_get_reg8(mem_buf, env->cf[n - 32]); + } else if (n == 40) { return gdb_get_reg32(mem_buf, env->fcsr0); } return 0; @@ -87,11 +86,10 @@ static int loongarch_gdb_set_fpu(CPULoongArchState *env, if (0 <= n && n < 32) { env->fpr[n].vreg.D(0) = ldq_p(mem_buf); length = 8; - } else if (n == 32) { - uint64_t val = ldq_p(mem_buf); - write_fcc(env, val); - length = 8; - } else if (n == 33) { + } else if (32 <= n && n < 40) { + env->cf[n - 32] = ldub_p(mem_buf); + length = 1; + } else if (n == 40) { env->fcsr0 = ldl_p(mem_buf); length = 4; }