from:"Jiajie Chen"

Re: [PATCH 0/5] Add LoongArch v1.1 instructions

2023-10-31 Thread Jiajie Chen




On 2023/10/31 19:06, gaosong wrote:

在 2023/10/31 下午5:13, Jiajie Chen 写道:


On 2023/10/31 17:11, gaosong wrote:

在 2023/10/30 下午7:54, Jiajie Chen 写道:


On 2023/10/30 16:23, gaosong wrote:

在 2023/10/28 下午9:09, Jiajie Chen 写道:


On 2023/10/26 14:54, gaosong wrote:

在 2023/10/26 上午9:38, Jiajie Chen 写道:


On 2023/10/26 03:04, Richard Henderson wrote:

On 10/25/23 10:13, Jiajie Chen wrote:

On 2023/10/24 07:26, Richard Henderson wrote:
See target/arm/tcg/translate-a64.c, gen_store_exclusive, 
TCGv_i128 block.

See target/ppc/translate.c, gen_stqcx_.


The situation here is slightly different: aarch64 and ppc64 
have both 128-bit ll and sc, however LoongArch v1.1 only has 
64-bit ll and 128-bit sc.


Ah, that does complicate things.


Possibly use the combination of ll.d and ld.d:


ll.d lo, base, 0
ld.d hi, base, 4

# do some computation

sc.q lo, hi, base

# try again if sc failed

Then a possible implementation of gen_ll() would be: align 
base to 128-bit boundary, read 128-bit from memory, save 
64-bit part to rd and record whole 128-bit data in llval. 
Then, in gen_sc_q(), it uses a 128-bit cmpxchg.



But what about the reversed instruction pattern: ll.d hi, 
base, 4; ld.d lo, base 0?


It would be worth asking your hardware engineers about the 
bounds of legal behaviour. Ideally there would be some very 
explicit language, similar to



I'm a community developer not affiliated with Loongson. Song 
Gao, could you provide some detail from Loongson Inc.?





ll.d   r1, base, 0
dbar 0x700  ==> see 2.2.8.1
ld.d  r2, base,  8
...
sc.q r1, r2, base



Thanks! I think we may need to detect the ll.d-dbar-ld.d sequence 
and translate the sequence into one tcg_gen_qemu_ld_i128 and 
split the result into two 64-bit parts. Can do this in QEMU?




Oh, I'm not sure.

I think we just need to implement sc.q. We don't need to care 
about 'll.d-dbar-ld.d'. It's just like 'll.q'.

It needs the user to ensure that .

ll.q' is
1) ll.d r1 base, 0 ==> set LLbit, load the low 64 bits into r1
2) dbar 0x700　
3) ld.d r2 base, 8 ==> load the high 64 bits to r2

sc.q needs to
1) Use 64-bit cmpxchg.
2) Write 128 bits to memory.


Consider the following code:


ll.d r1, base, 0

dbar 0x700

ld.d r2, base, 8

addi.d r2, r2, 1

sc.q r1, r2, base


We translate them into native code:


ld.d r1, base, 0

mv LLbit, 1

mv LLaddr, base

mv LLval, r1

dbar 0x700

ld.d r2, base, 8

addi.d r2, r2, 1

if (LLbit == 1 && LLaddr == base) {

    cmpxchg addr=base compare=LLval new=r1

    128-bit write {r2, r1} to base if cmpxchg succeeded

}

set r1 if sc.q succeeded



If the memory content of base+8 has changed between ld.d r2 and 
addi.d r2, the atomicity is not guaranteed, i.e. only the high part 
has changed, the low part hasn't.



Sorry,  my mistake.  need use cmpxchg_i128.   See 
target/arm/tcg/translate-a64.c   gen_store_exclusive().


gen_scq(rd, rk, rj)
{
 ...
    TCGv_i128 t16 = tcg_temp_new_i128();
    TCGv_i128 c16 = tcg_temp_new_i128();
    TCGv_i64 low = tcg_temp_new_i64();
    TCGv_i64 high= tcg_temp_new_i64();
    TCGv_i64 temp = tcg_temp_new_i64();

    tcg_gen_concat_i64_i128(t16, cpu_gpr[rd],  cpu_gpr[rk]));

    tcg_gen_qemu_ld(low, cpu_lladdr, ctx->mem_idx, MO_TEUQ);
    tcg_gen_addi_tl(temp, cpu_lladdr, 8);
    tcg_gen_mb(TCG_BAR_SC | TCG_MO_LD_LD);
    tcg_gen_qemu_ld(high, temp, ctx->mem_idx, MO_TEUQ);



The problem is that, the high value read here might not equal to the 
previously read one in ll.d r2, base 8 instruction.
I think dbar 0x7000 ensures that the 2 loads in 'll.q' are a 128bit 
atomic operation.



The code does work in real LoongArch machine. However, we are emulating 
LoongArch in qemu, we have to make it atomic, yet it isn't now.





Thanks.
Song Gao

tcg_gen_concat_i64_i128(c16, low, high);

    tcg_gen_atomic_cmpxchg_i128(t16, cpu_lladdr, c16, t16, 
ctx->mem_idx, MO_128);


    ...
}

I am not sure this is right.

I think Richard can give you more suggestions. @Richard

Thanks.
Song Gao



Thanks.
Song Gao



For this series,
I think we need set the new config bits to the 'max cpu', and 
change linux-user/target_elf.h ''any' to 'max', so that we can 
use these new instructions on linux-user mode.


I will work on it.




Thanks
Song Gao


https://developer.arm.com/documentation/ddi0487/latest/
B2.9.5 Load-Exclusive and Store-Exclusive instruction usage 
restrictions


But you could do the same thing, aligning and recording the 
entire 128-bit quantity, then extract the ll.d result based on 
address bit 6. This would complicate the implementation of 
sc.d as well, but would perhaps bring us "close enough" to the 
actual architecture.


Note that our Arm store-exclusive implementation isn't quite 
in spec either.  There is quite a large comment within 
translate-a64.c store_exclusive() about the ways things are 
not quite right.  But it seems to be close enough for actual 
usage to succeed.



r~

Re: [PATCH 0/5] Add LoongArch v1.1 instructions

2023-10-31 Thread Jiajie Chen




On 2023/10/31 17:11, gaosong wrote:

在 2023/10/30 下午7:54, Jiajie Chen 写道:


On 2023/10/30 16:23, gaosong wrote:

在 2023/10/28 下午9:09, Jiajie Chen 写道:


On 2023/10/26 14:54, gaosong wrote:

在 2023/10/26 上午9:38, Jiajie Chen 写道:


On 2023/10/26 03:04, Richard Henderson wrote:

On 10/25/23 10:13, Jiajie Chen wrote:

On 2023/10/24 07:26, Richard Henderson wrote:
See target/arm/tcg/translate-a64.c, gen_store_exclusive, 
TCGv_i128 block.

See target/ppc/translate.c, gen_stqcx_.


The situation here is slightly different: aarch64 and ppc64 
have both 128-bit ll and sc, however LoongArch v1.1 only has 
64-bit ll and 128-bit sc.


Ah, that does complicate things.


Possibly use the combination of ll.d and ld.d:


ll.d lo, base, 0
ld.d hi, base, 4

# do some computation

sc.q lo, hi, base

# try again if sc failed

Then a possible implementation of gen_ll() would be: align base 
to 128-bit boundary, read 128-bit from memory, save 64-bit part 
to rd and record whole 128-bit data in llval. Then, in 
gen_sc_q(), it uses a 128-bit cmpxchg.



But what about the reversed instruction pattern: ll.d hi, base, 
4; ld.d lo, base 0?


It would be worth asking your hardware engineers about the 
bounds of legal behaviour. Ideally there would be some very 
explicit language, similar to



I'm a community developer not affiliated with Loongson. Song Gao, 
could you provide some detail from Loongson Inc.?





ll.d   r1, base, 0
dbar 0x700  ==> see 2.2.8.1
ld.d  r2, base,  8
...
sc.q r1, r2, base



Thanks! I think we may need to detect the ll.d-dbar-ld.d sequence 
and translate the sequence into one tcg_gen_qemu_ld_i128 and split 
the result into two 64-bit parts. Can do this in QEMU?




Oh, I'm not sure.

I think we just need to implement sc.q. We don't need to care about 
'll.d-dbar-ld.d'. It's just like 'll.q'.

It needs the user to ensure that .

ll.q' is
1) ll.d r1 base, 0 ==> set LLbit, load the low 64 bits into r1
2) dbar 0x700　
3) ld.d r2 base, 8 ==> load the high 64 bits to r2

sc.q needs to
1) Use 64-bit cmpxchg.
2) Write 128 bits to memory.


Consider the following code:


ll.d r1, base, 0

dbar 0x700

ld.d r2, base, 8

addi.d r2, r2, 1

sc.q r1, r2, base


We translate them into native code:


ld.d r1, base, 0

mv LLbit, 1

mv LLaddr, base

mv LLval, r1

dbar 0x700

ld.d r2, base, 8

addi.d r2, r2, 1

if (LLbit == 1 && LLaddr == base) {

    cmpxchg addr=base compare=LLval new=r1

    128-bit write {r2, r1} to base if cmpxchg succeeded

}

set r1 if sc.q succeeded



If the memory content of base+8 has changed between ld.d r2 and 
addi.d r2, the atomicity is not guaranteed, i.e. only the high part 
has changed, the low part hasn't.



Sorry,  my mistake.  need use cmpxchg_i128.   See 
target/arm/tcg/translate-a64.c   gen_store_exclusive().


gen_scq(rd, rk, rj)
{
 ...
    TCGv_i128 t16 = tcg_temp_new_i128();
    TCGv_i128 c16 = tcg_temp_new_i128();
    TCGv_i64 low = tcg_temp_new_i64();
    TCGv_i64 high= tcg_temp_new_i64();
    TCGv_i64 temp = tcg_temp_new_i64();

    tcg_gen_concat_i64_i128(t16, cpu_gpr[rd],  cpu_gpr[rk]));

    tcg_gen_qemu_ld(low, cpu_lladdr, ctx->mem_idx,  MO_TEUQ);
    tcg_gen_addi_tl(temp, cpu_lladdr, 8);
    tcg_gen_mb(TCG_BAR_SC | TCG_MO_LD_LD);
    tcg_gen_qemu_ld(high, temp, ctx->mem_idx, MO_TEUQ);



The problem is that, the high value read here might not equal to the 
previously read one in ll.d r2, base 8 instruction.




tcg_gen_concat_i64_i128(c16, low,  high);

    tcg_gen_atomic_cmpxchg_i128(t16, cpu_lladdr, c16, t16, 
ctx->mem_idx, MO_128);


    ...
}

I am not sure this is right.

I think Richard can give you more suggestions. @Richard

Thanks.
Song Gao



Thanks.
Song Gao



For this series,
I think we need set the new config bits to the 'max cpu', and 
change linux-user/target_elf.h ''any' to 'max', so that we can use 
these new instructions on linux-user mode.


I will work on it.




Thanks
Song Gao


https://developer.arm.com/documentation/ddi0487/latest/
B2.9.5 Load-Exclusive and Store-Exclusive instruction usage 
restrictions


But you could do the same thing, aligning and recording the 
entire 128-bit quantity, then extract the ll.d result based on 
address bit 6.  This would complicate the implementation of sc.d 
as well, but would perhaps bring us "close enough" to the actual 
architecture.


Note that our Arm store-exclusive implementation isn't quite in 
spec either.  There is quite a large comment within 
translate-a64.c store_exclusive() about the ways things are not 
quite right.  But it seems to be close enough for actual usage 
to succeed.



r~

Re: [PATCH 0/5] Add LoongArch v1.1 instructions

2023-10-30 Thread Jiajie Chen




On 2023/10/30 16:23, gaosong wrote:

在 2023/10/28 下午9:09, Jiajie Chen 写道:


On 2023/10/26 14:54, gaosong wrote:

在 2023/10/26 上午9:38, Jiajie Chen 写道:


On 2023/10/26 03:04, Richard Henderson wrote:

On 10/25/23 10:13, Jiajie Chen wrote:

On 2023/10/24 07:26, Richard Henderson wrote:
See target/arm/tcg/translate-a64.c, gen_store_exclusive, 
TCGv_i128 block.

See target/ppc/translate.c, gen_stqcx_.


The situation here is slightly different: aarch64 and ppc64 have 
both 128-bit ll and sc, however LoongArch v1.1 only has 64-bit 
ll and 128-bit sc.


Ah, that does complicate things.


Possibly use the combination of ll.d and ld.d:


ll.d lo, base, 0
ld.d hi, base, 4

# do some computation

sc.q lo, hi, base

# try again if sc failed

Then a possible implementation of gen_ll() would be: align base 
to 128-bit boundary, read 128-bit from memory, save 64-bit part 
to rd and record whole 128-bit data in llval. Then, in 
gen_sc_q(), it uses a 128-bit cmpxchg.



But what about the reversed instruction pattern: ll.d hi, base, 
4; ld.d lo, base 0?


It would be worth asking your hardware engineers about the bounds 
of legal behaviour. Ideally there would be some very explicit 
language, similar to



I'm a community developer not affiliated with Loongson. Song Gao, 
could you provide some detail from Loongson Inc.?





ll.d   r1, base, 0
dbar 0x700  ==> see 2.2.8.1
ld.d  r2, base,  8
...
sc.q r1, r2, base



Thanks! I think we may need to detect the ll.d-dbar-ld.d sequence and 
translate the sequence into one tcg_gen_qemu_ld_i128 and split the 
result into two 64-bit parts. Can do this in QEMU?




Oh, I'm not sure.

I think we just need to implement sc.q. We don't need to care about 
'll.d-dbar-ld.d'. It's just like 'll.q'.

It needs the user to ensure that .

ll.q' is
1) ll.d r1 base, 0 ==> set LLbit, load the low 64 bits into r1
2) dbar 0x700　
3) ld.d r2 base, 8 ==> load the high 64 bits to r2

sc.q needs to
1) Use 64-bit cmpxchg.
2) Write 128 bits to memory.


Consider the following code:


ll.d r1, base, 0

dbar 0x700

ld.d r2, base, 8

addi.d r2, r2, 1

sc.q r1, r2, base


We translate them into native code:


ld.d r1, base, 0

mv LLbit, 1

mv LLaddr, base

mv LLval, r1

dbar 0x700

ld.d r2, base, 8

addi.d r2, r2, 1

if (LLbit == 1 && LLaddr == base) {

    cmpxchg addr=base compare=LLval new=r1

    128-bit write {r2, r1} to base if cmpxchg succeeded

}

set r1 if sc.q succeeded



If the memory content of base+8 has changed between ld.d r2 and addi.d 
r2, the atomicity is not guaranteed, i.e. only the high part has 
changed, the low part hasn't.






Thanks.
Song Gao



For this series,
I think we need set the new config bits to the 'max cpu', and change 
linux-user/target_elf.h ''any' to 'max', so that we can use these 
new instructions on linux-user mode.


I will work on it.




Thanks
Song Gao


https://developer.arm.com/documentation/ddi0487/latest/
B2.9.5 Load-Exclusive and Store-Exclusive instruction usage 
restrictions


But you could do the same thing, aligning and recording the entire 
128-bit quantity, then extract the ll.d result based on address 
bit 6.  This would complicate the implementation of sc.d as well, 
but would perhaps bring us "close enough" to the actual architecture.


Note that our Arm store-exclusive implementation isn't quite in 
spec either.  There is quite a large comment within 
translate-a64.c store_exclusive() about the ways things are not 
quite right.  But it seems to be close enough for actual usage to 
succeed.



r~

Re: [PATCH 0/5] Add LoongArch v1.1 instructions

2023-10-28 Thread Jiajie Chen




On 2023/10/26 14:54, gaosong wrote:

在 2023/10/26 上午9:38, Jiajie Chen 写道:


On 2023/10/26 03:04, Richard Henderson wrote:

On 10/25/23 10:13, Jiajie Chen wrote:

On 2023/10/24 07:26, Richard Henderson wrote:
See target/arm/tcg/translate-a64.c, gen_store_exclusive, 
TCGv_i128 block.

See target/ppc/translate.c, gen_stqcx_.


The situation here is slightly different: aarch64 and ppc64 have 
both 128-bit ll and sc, however LoongArch v1.1 only has 64-bit ll 
and 128-bit sc.


Ah, that does complicate things.


Possibly use the combination of ll.d and ld.d:


ll.d lo, base, 0
ld.d hi, base, 4

# do some computation

sc.q lo, hi, base

# try again if sc failed

Then a possible implementation of gen_ll() would be: align base to 
128-bit boundary, read 128-bit from memory, save 64-bit part to rd 
and record whole 128-bit data in llval. Then, in gen_sc_q(), it 
uses a 128-bit cmpxchg.



But what about the reversed instruction pattern: ll.d hi, base, 4; 
ld.d lo, base 0?


It would be worth asking your hardware engineers about the bounds of 
legal behaviour. Ideally there would be some very explicit language, 
similar to



I'm a community developer not affiliated with Loongson. Song Gao, 
could you provide some detail from Loongson Inc.?





ll.d   r1, base, 0
dbar 0x700  ==> see 2.2.8.1
ld.d  r2, base,  8
...
sc.q r1, r2, base



Thanks! I think we may need to detect the ll.d-dbar-ld.d sequence and 
translate the sequence into one tcg_gen_qemu_ld_i128 and split the 
result into two 64-bit parts. Can do this in QEMU?






For this series,
I think we need set the new config bits to the 'max cpu', and change 
linux-user/target_elf.h ''any' to 'max', so that we can use these new 
instructions on linux-user mode.


I will work on it.




Thanks
Song Gao


https://developer.arm.com/documentation/ddi0487/latest/
B2.9.5 Load-Exclusive and Store-Exclusive instruction usage 
restrictions


But you could do the same thing, aligning and recording the entire 
128-bit quantity, then extract the ll.d result based on address bit 
6.  This would complicate the implementation of sc.d as well, but 
would perhaps bring us "close enough" to the actual architecture.


Note that our Arm store-exclusive implementation isn't quite in spec 
either.  There is quite a large comment within translate-a64.c 
store_exclusive() about the ways things are not quite right.  But it 
seems to be close enough for actual usage to succeed.



r~

Re: [PATCH 0/5] Add LoongArch v1.1 instructions

2023-10-25 Thread Jiajie Chen




On 2023/10/26 03:04, Richard Henderson wrote:

On 10/25/23 10:13, Jiajie Chen wrote:

On 2023/10/24 07:26, Richard Henderson wrote:
See target/arm/tcg/translate-a64.c, gen_store_exclusive, TCGv_i128 
block.

See target/ppc/translate.c, gen_stqcx_.


The situation here is slightly different: aarch64 and ppc64 have 
both 128-bit ll and sc, however LoongArch v1.1 only has 64-bit ll 
and 128-bit sc.


Ah, that does complicate things.


Possibly use the combination of ll.d and ld.d:


ll.d lo, base, 0
ld.d hi, base, 4

# do some computation

sc.q lo, hi, base

# try again if sc failed

Then a possible implementation of gen_ll() would be: align base to 
128-bit boundary, read 128-bit from memory, save 64-bit part to rd 
and record whole 128-bit data in llval. Then, in gen_sc_q(), it uses 
a 128-bit cmpxchg.



But what about the reversed instruction pattern: ll.d hi, base, 4; 
ld.d lo, base 0?


It would be worth asking your hardware engineers about the bounds of 
legal behaviour. Ideally there would be some very explicit language, 
similar to



I'm a community developer not affiliated with Loongson. Song Gao, could 
you provide some detail from Loongson Inc.?





https://developer.arm.com/documentation/ddi0487/latest/
B2.9.5 Load-Exclusive and Store-Exclusive instruction usage restrictions

But you could do the same thing, aligning and recording the entire 
128-bit quantity, then extract the ll.d result based on address bit 
6.  This would complicate the implementation of sc.d as well, but 
would perhaps bring us "close enough" to the actual architecture.


Note that our Arm store-exclusive implementation isn't quite in spec 
either.  There is quite a large comment within translate-a64.c 
store_exclusive() about the ways things are not quite right.  But it 
seems to be close enough for actual usage to succeed.



r~

Re: [PATCH 0/5] Add LoongArch v1.1 instructions

2023-10-25 Thread Jiajie Chen




On 2023/10/24 14:10, Jiajie Chen wrote:


On 2023/10/24 07:26, Richard Henderson wrote:

On 10/23/23 08:29, Jiajie Chen wrote:
This patch series implements the new instructions except sc.q, 
because I do not know how to match a pair of ll.d to sc.q.


There are a couple of examples within the tree.

See target/arm/tcg/translate-a64.c, gen_store_exclusive, TCGv_i128 
block.

See target/ppc/translate.c, gen_stqcx_.



The situation here is slightly different: aarch64 and ppc64 have both 
128-bit ll and sc, however LoongArch v1.1 only has 64-bit ll and 
128-bit sc. I guest the intended usage of sc.q is:



ll.d lo, base, 0

ll.d hi, base, 4

# do some computation

sc.q lo, hi, base

# try again if sc failed



Possibly use the combination of ll.d and ld.d:


ll.d lo, base, 0

ld.d hi, base, 4

# do some computation

sc.q lo, hi, base

# try again if sc failed


Then a possible implementation of gen_ll() would be: align base to 
128-bit boundary, read 128-bit from memory, save 64-bit part to rd and 
record whole 128-bit data in llval. Then, in gen_sc_q(), it uses a 
128-bit cmpxchg.



But what about the reversed instruction pattern: ll.d hi, base, 4; ld.d 
lo, base 0?



Since there are no existing code utilizing the new sc.q instruction, I 
don't know what should we consider here.










r~

Re: [PATCH 0/5] Add LoongArch v1.1 instructions

2023-10-24 Thread Jiajie Chen




On 2023/10/24 07:26, Richard Henderson wrote:

On 10/23/23 08:29, Jiajie Chen wrote:
This patch series implements the new instructions except sc.q, 
because I do not know how to match a pair of ll.d to sc.q.


There are a couple of examples within the tree.

See target/arm/tcg/translate-a64.c, gen_store_exclusive, TCGv_i128 block.
See target/ppc/translate.c, gen_stqcx_.



The situation here is slightly different: aarch64 and ppc64 have both 
128-bit ll and sc, however LoongArch v1.1 only has 64-bit ll and 128-bit 
sc. I guest the intended usage of sc.q is:



ll.d lo, base, 0

ll.d hi, base, 4

# do some computation

sc.q lo, hi, base

# try again if sc failed






r~

Re: [PATCH 1/5] include/exec/memop.h: Add MO_TESB

2023-10-23 Thread Jiajie Chen




On 2023/10/23 23:49, David Hildenbrand wrote:


Why?

On 23.10.23 17:29, Jiajie Chen wrote:

Signed-off-by: Jiajie Chen 
---
  include/exec/memop.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/include/exec/memop.h b/include/exec/memop.h
index a86dc6743a..834327c62d 100644
--- a/include/exec/memop.h
+++ b/include/exec/memop.h
@@ -140,6 +140,7 @@ typedef enum MemOp {
  MO_TEUL  = MO_TE | MO_UL,
  MO_TEUQ  = MO_TE | MO_UQ,
  MO_TEUO  = MO_TE | MO_UO,
+    MO_TESB  = MO_TE | MO_SB,
  MO_TESW  = MO_TE | MO_SW,
  MO_TESL  = MO_TE | MO_SL,
  MO_TESQ  = MO_TE | MO_SQ,




I recall that the reason for not having this is that the target 
endianess doesn't matter for single bytes.


Thanks, you are right, I was copying some code using MO_TESW only to 
find that MO_TESB is missing... I should simply use MO_SB then.

Re: [PATCH 3/5] target/loongarch: Add amcas[_db].{b/h/w/d}

2023-10-23 Thread Jiajie Chen




On 2023/10/23 23:29, Jiajie Chen wrote:

The new instructions are introduced in LoongArch v1.1:

- amcas.b
- amcas.h
- amcas.w
- amcas.d
- amcas_db.b
- amcas_db.h
- amcas_db.w
- amcas_db.d

The new instructions are gated by CPUCFG2.LAMCAS.

Signed-off-by: Jiajie Chen 
---
  target/loongarch/cpu.h|  1 +
  target/loongarch/disas.c  |  8 +++
  .../loongarch/insn_trans/trans_atomic.c.inc   | 24 +++
  target/loongarch/insns.decode |  8 +++
  target/loongarch/translate.h  |  1 +
  5 files changed, 42 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 7166c07756..80a476c3f8 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -156,6 +156,7 @@ FIELD(CPUCFG2, LBT_MIPS, 20, 1)
  FIELD(CPUCFG2, LSPW, 21, 1)
  FIELD(CPUCFG2, LAM, 22, 1)
  FIELD(CPUCFG2, LAM_BH, 27, 1)
+FIELD(CPUCFG2, LAMCAS, 28, 1)
  
  /* cpucfg[3] bits */

  FIELD(CPUCFG3, CCDMA, 0, 1)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index d33aa8173a..4aa67749cf 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -575,6 +575,14 @@ INSN(fldx_s,   frr)
  INSN(fldx_d,   frr)
  INSN(fstx_s,   frr)
  INSN(fstx_d,   frr)
+INSN(amcas_b,  rrr)
+INSN(amcas_h,  rrr)
+INSN(amcas_w,  rrr)
+INSN(amcas_d,  rrr)
+INSN(amcas_db_b,   rrr)
+INSN(amcas_db_h,   rrr)
+INSN(amcas_db_w,   rrr)
+INSN(amcas_db_d,   rrr)
  INSN(amswap_b, rrr)
  INSN(amswap_h, rrr)
  INSN(amadd_b,  rrr)
diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc 
b/target/loongarch/insn_trans/trans_atomic.c.inc
index cd28e217ad..bea567fdaf 100644
--- a/target/loongarch/insn_trans/trans_atomic.c.inc
+++ b/target/loongarch/insn_trans/trans_atomic.c.inc
@@ -45,6 +45,22 @@ static bool gen_sc(DisasContext *ctx, arg_rr_i *a, MemOp mop)
  return true;
  }
  
+static bool gen_cas(DisasContext *ctx, arg_rrr *a,

+void (*func)(TCGv, TCGv, TCGv, TCGv, TCGArg, MemOp),
+MemOp mop)
+{
+TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+TCGv val = gpr_src(ctx, a->rk, EXT_NONE);
+
+addr = make_address_i(ctx, addr, 0);
+


I'm unsure if I can use the same TCGv for the first and the third 
argument here. If it violates with the assumption, a temporary register 
can be used.



+func(dest, addr, dest, val, ctx->mem_idx, mop);
+gen_set_gpr(a->rd, dest, EXT_NONE);
+
+return true;
+}
+
  static bool gen_am(DisasContext *ctx, arg_rrr *a,
 void (*func)(TCGv, TCGv, TCGv, TCGArg, MemOp),
 MemOp mop)
@@ -73,6 +89,14 @@ TRANS(ll_w, ALL, gen_ll, MO_TESL)
  TRANS(sc_w, ALL, gen_sc, MO_TESL)
  TRANS(ll_d, 64, gen_ll, MO_TEUQ)
  TRANS(sc_d, 64, gen_sc, MO_TEUQ)
+TRANS(amcas_b, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESB)
+TRANS(amcas_h, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESW)
+TRANS(amcas_w, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESL)
+TRANS(amcas_d, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TEUQ)
+TRANS(amcas_db_b, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESB)
+TRANS(amcas_db_h, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESW)
+TRANS(amcas_db_w, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESL)
+TRANS(amcas_db_d, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TEUQ)
  TRANS(amswap_b, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESB)
  TRANS(amswap_h, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESW)
  TRANS(amadd_b, LAM_BH, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESB)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 678ce42038..cf4123cd46 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -261,6 +261,14 @@ ll_w0010  .. . . 
@rr_i14s2
  sc_w0010 0001 .. . . @rr_i14s2
  ll_d0010 0010 .. . . @rr_i14s2
  sc_d0010 0011 .. . . @rr_i14s2
+amcas_b 0011 1101 1 . . .@rrr
+amcas_h 0011 1101 10001 . . .@rrr
+amcas_w 0011 1101 10010 . . .@rrr
+amcas_d 0011 1101 10011 . . .@rrr
+amcas_db_b  0011 1101 10100 . . .@rrr
+amcas_db_h  0011 1101 10101 . . .@rrr
+amcas_db_w  0011 1101 10110 . . .@rrr
+amcas_db_d  0011 1101 10111 . . .@rrr
  amswap_b0011 1101 11000 . . .@rrr
  amswap_h0011 1101 11001 . . .@rrr
  amadd_b 0011 1101 11010 . . .@rrr
diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index 0b230530e7..3affefdafc 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/trans

[PATCH 4/5] target/loongarch: Add estimated reciprocal instructions

2023-10-23 Thread Jiajie Chen

Add the following new instructions in LoongArch v1.1:

- frecipe.s
- frecipe.d
- frsqrte.s
- frsqrte.d
- vfrecipe.s
- vfrecipe.d
- vfrsqrte.s
- vfrsqrte.d
- xvfrecipe.s
- xvfrecipe.d
- xvfrsqrte.s
- xvfrsqrte.d

They are guarded by CPUCFG2.FRECIPE. Altought the instructions allow
implementation to improve performance by reducing precision, we use the
existing softfloat implementation.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/cpu.h |  1 +
 target/loongarch/disas.c   | 12 
 target/loongarch/insn_trans/trans_farith.c.inc |  4 
 target/loongarch/insn_trans/trans_vec.c.inc|  8 
 target/loongarch/insns.decode  | 12 
 target/loongarch/translate.h   |  6 ++
 6 files changed, 43 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 80a476c3f8..8f938effa8 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -155,6 +155,7 @@ FIELD(CPUCFG2, LBT_ARM, 19, 1)
 FIELD(CPUCFG2, LBT_MIPS, 20, 1)
 FIELD(CPUCFG2, LSPW, 21, 1)
 FIELD(CPUCFG2, LAM, 22, 1)
+FIELD(CPUCFG2, FRECIPE, 25, 1)
 FIELD(CPUCFG2, LAM_BH, 27, 1)
 FIELD(CPUCFG2, LAMCAS, 28, 1)
 
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 4aa67749cf..9eb49fb5e3 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -473,6 +473,10 @@ INSN(frecip_s, ff)
 INSN(frecip_d, ff)
 INSN(frsqrt_s, ff)
 INSN(frsqrt_d, ff)
+INSN(frecipe_s,ff)
+INSN(frecipe_d,ff)
+INSN(frsqrte_s,ff)
+INSN(frsqrte_d,ff)
 INSN(fmov_s,   ff)
 INSN(fmov_d,   ff)
 INSN(movgr2fr_w,   fr)
@@ -1424,6 +1428,10 @@ INSN_LSX(vfrecip_s,vv)
 INSN_LSX(vfrecip_d,vv)
 INSN_LSX(vfrsqrt_s,vv)
 INSN_LSX(vfrsqrt_d,vv)
+INSN_LSX(vfrecipe_s,   vv)
+INSN_LSX(vfrecipe_d,   vv)
+INSN_LSX(vfrsqrte_s,   vv)
+INSN_LSX(vfrsqrte_d,   vv)
 
 INSN_LSX(vfcvtl_s_h,   vv)
 INSN_LSX(vfcvth_s_h,   vv)
@@ -2338,6 +2346,10 @@ INSN_LASX(xvfrecip_s,vv)
 INSN_LASX(xvfrecip_d,vv)
 INSN_LASX(xvfrsqrt_s,vv)
 INSN_LASX(xvfrsqrt_d,vv)
+INSN_LASX(xvfrecipe_s,   vv)
+INSN_LASX(xvfrecipe_d,   vv)
+INSN_LASX(xvfrsqrte_s,   vv)
+INSN_LASX(xvfrsqrte_d,   vv)
 
 INSN_LASX(xvfcvtl_s_h,   vv)
 INSN_LASX(xvfcvth_s_h,   vv)
diff --git a/target/loongarch/insn_trans/trans_farith.c.inc 
b/target/loongarch/insn_trans/trans_farith.c.inc
index f4a0dea727..356cdf99b7 100644
--- a/target/loongarch/insn_trans/trans_farith.c.inc
+++ b/target/loongarch/insn_trans/trans_farith.c.inc
@@ -191,6 +191,10 @@ TRANS(frecip_s, FP_SP, gen_ff, gen_helper_frecip_s)
 TRANS(frecip_d, FP_DP, gen_ff, gen_helper_frecip_d)
 TRANS(frsqrt_s, FP_SP, gen_ff, gen_helper_frsqrt_s)
 TRANS(frsqrt_d, FP_DP, gen_ff, gen_helper_frsqrt_d)
+TRANS(frecipe_s, FRECIPE_FP_SP, gen_ff, gen_helper_frecip_s)
+TRANS(frecipe_d, FRECIPE_FP_DP, gen_ff, gen_helper_frecip_d)
+TRANS(frsqrte_s, FRECIPE_FP_SP, gen_ff, gen_helper_frsqrt_s)
+TRANS(frsqrte_d, FRECIPE_FP_DP, gen_ff, gen_helper_frsqrt_d)
 TRANS(flogb_s, FP_SP, gen_ff, gen_helper_flogb_s)
 TRANS(flogb_d, FP_DP, gen_ff, gen_helper_flogb_d)
 TRANS(fclass_s, FP_SP, gen_ff, gen_helper_fclass_s)
diff --git a/target/loongarch/insn_trans/trans_vec.c.inc 
b/target/loongarch/insn_trans/trans_vec.c.inc
index 98f856bb29..1c93e19ac4 100644
--- a/target/loongarch/insn_trans/trans_vec.c.inc
+++ b/target/loongarch/insn_trans/trans_vec.c.inc
@@ -4409,12 +4409,20 @@ TRANS(vfrecip_s, LSX, gen_vv_ptr, gen_helper_vfrecip_s)
 TRANS(vfrecip_d, LSX, gen_vv_ptr, gen_helper_vfrecip_d)
 TRANS(vfrsqrt_s, LSX, gen_vv_ptr, gen_helper_vfrsqrt_s)
 TRANS(vfrsqrt_d, LSX, gen_vv_ptr, gen_helper_vfrsqrt_d)
+TRANS(vfrecipe_s, FRECIPE_LSX, gen_vv_ptr, gen_helper_vfrecip_s)
+TRANS(vfrecipe_d, FRECIPE_LSX, gen_vv_ptr, gen_helper_vfrecip_d)
+TRANS(vfrsqrte_s, FRECIPE_LSX, gen_vv_ptr, gen_helper_vfrsqrt_s)
+TRANS(vfrsqrte_d, FRECIPE_LSX, gen_vv_ptr, gen_helper_vfrsqrt_d)
 TRANS(xvfsqrt_s, LASX, gen_xx_ptr, gen_helper_vfsqrt_s)
 TRANS(xvfsqrt_d, LASX, gen_xx_ptr, gen_helper_vfsqrt_d)
 TRANS(xvfrecip_s, LASX, gen_xx_ptr, gen_helper_vfrecip_s)
 TRANS(xvfrecip_d, LASX, gen_xx_ptr, gen_helper_vfrecip_d)
 TRANS(xvfrsqrt_s, LASX, gen_xx_ptr, gen_helper_vfrsqrt_s)
 TRANS(xvfrsqrt_d, LASX, gen_xx_ptr, gen_helper_vfrsqrt_d)
+TRANS(xvfrecipe_s, FRECIPE_LASX, gen_xx_ptr, gen_helper_vfrecip_s)
+TRANS(xvfrecipe_d, FRECIPE_LASX, gen_xx_ptr, gen_helper_vfrecip_d)
+TRANS(xvfrsqrte_s, FRECIPE_LASX, gen_xx_ptr, gen_helper_vfrsqrt_s)
+TRANS(xvfrsqrte_d, FRECIPE_LASX, gen_xx_ptr, gen_helper_vfrsqrt_d)
 
 TRANS(vfcvtl_s_h, LSX, gen_vv_ptr, gen_helper_vfcvtl_s_h)
 TRANS(vfcvth_s_h, LSX, gen_vv_ptr, gen_helper_vfcvth_s_h)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index cf4123cd46..92078f0f9f 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -371,6 +371,10 @@ frecip_s 00010001 01000

[PATCH 3/5] target/loongarch: Add amcas[_db].{b/h/w/d}

2023-10-23 Thread Jiajie Chen

The new instructions are introduced in LoongArch v1.1:

- amcas.b
- amcas.h
- amcas.w
- amcas.d
- amcas_db.b
- amcas_db.h
- amcas_db.w
- amcas_db.d

The new instructions are gated by CPUCFG2.LAMCAS.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/cpu.h|  1 +
 target/loongarch/disas.c  |  8 +++
 .../loongarch/insn_trans/trans_atomic.c.inc   | 24 +++
 target/loongarch/insns.decode |  8 +++
 target/loongarch/translate.h  |  1 +
 5 files changed, 42 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 7166c07756..80a476c3f8 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -156,6 +156,7 @@ FIELD(CPUCFG2, LBT_MIPS, 20, 1)
 FIELD(CPUCFG2, LSPW, 21, 1)
 FIELD(CPUCFG2, LAM, 22, 1)
 FIELD(CPUCFG2, LAM_BH, 27, 1)
+FIELD(CPUCFG2, LAMCAS, 28, 1)
 
 /* cpucfg[3] bits */
 FIELD(CPUCFG3, CCDMA, 0, 1)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index d33aa8173a..4aa67749cf 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -575,6 +575,14 @@ INSN(fldx_s,   frr)
 INSN(fldx_d,   frr)
 INSN(fstx_s,   frr)
 INSN(fstx_d,   frr)
+INSN(amcas_b,  rrr)
+INSN(amcas_h,  rrr)
+INSN(amcas_w,  rrr)
+INSN(amcas_d,  rrr)
+INSN(amcas_db_b,   rrr)
+INSN(amcas_db_h,   rrr)
+INSN(amcas_db_w,   rrr)
+INSN(amcas_db_d,   rrr)
 INSN(amswap_b, rrr)
 INSN(amswap_h, rrr)
 INSN(amadd_b,  rrr)
diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc 
b/target/loongarch/insn_trans/trans_atomic.c.inc
index cd28e217ad..bea567fdaf 100644
--- a/target/loongarch/insn_trans/trans_atomic.c.inc
+++ b/target/loongarch/insn_trans/trans_atomic.c.inc
@@ -45,6 +45,22 @@ static bool gen_sc(DisasContext *ctx, arg_rr_i *a, MemOp mop)
 return true;
 }
 
+static bool gen_cas(DisasContext *ctx, arg_rrr *a,
+void (*func)(TCGv, TCGv, TCGv, TCGv, TCGArg, MemOp),
+MemOp mop)
+{
+TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
+TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+TCGv val = gpr_src(ctx, a->rk, EXT_NONE);
+
+addr = make_address_i(ctx, addr, 0);
+
+func(dest, addr, dest, val, ctx->mem_idx, mop);
+gen_set_gpr(a->rd, dest, EXT_NONE);
+
+return true;
+}
+
 static bool gen_am(DisasContext *ctx, arg_rrr *a,
void (*func)(TCGv, TCGv, TCGv, TCGArg, MemOp),
MemOp mop)
@@ -73,6 +89,14 @@ TRANS(ll_w, ALL, gen_ll, MO_TESL)
 TRANS(sc_w, ALL, gen_sc, MO_TESL)
 TRANS(ll_d, 64, gen_ll, MO_TEUQ)
 TRANS(sc_d, 64, gen_sc, MO_TEUQ)
+TRANS(amcas_b, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESB)
+TRANS(amcas_h, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESW)
+TRANS(amcas_w, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESL)
+TRANS(amcas_d, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TEUQ)
+TRANS(amcas_db_b, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESB)
+TRANS(amcas_db_h, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESW)
+TRANS(amcas_db_w, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESL)
+TRANS(amcas_db_d, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TEUQ)
 TRANS(amswap_b, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESB)
 TRANS(amswap_h, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESW)
 TRANS(amadd_b, LAM_BH, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESB)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 678ce42038..cf4123cd46 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -261,6 +261,14 @@ ll_w0010  .. . . 
@rr_i14s2
 sc_w0010 0001 .. . . @rr_i14s2
 ll_d0010 0010 .. . . @rr_i14s2
 sc_d0010 0011 .. . . @rr_i14s2
+amcas_b 0011 1101 1 . . .@rrr
+amcas_h 0011 1101 10001 . . .@rrr
+amcas_w 0011 1101 10010 . . .@rrr
+amcas_d 0011 1101 10011 . . .@rrr
+amcas_db_b  0011 1101 10100 . . .@rrr
+amcas_db_h  0011 1101 10101 . . .@rrr
+amcas_db_w  0011 1101 10110 . . .@rrr
+amcas_db_d  0011 1101 10111 . . .@rrr
 amswap_b0011 1101 11000 . . .@rrr
 amswap_h0011 1101 11001 . . .@rrr
 amadd_b 0011 1101 11010 . . .@rrr
diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index 0b230530e7..3affefdafc 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -23,6 +23,7 @@
 #define avail_LSPW(C)   (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSPW))
 #define avail_LAM(C)(FIELD_EX32((C)->cpucfg2, CPUCFG2, LAM))
 #define avail_LAM_BH(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, LAM_BH))

[PATCH 0/5] Add LoongArch v1.1 instructions

2023-10-23 Thread Jiajie Chen

Latest revision of LoongArch ISA is out at
https://www.loongson.cn/uploads/images/2023102309132647981.%E9%BE%99%E8%8A%AF%E6%9E%B6%E6%9E%84%E5%8F%82%E8%80%83%E6%89%8B%E5%86%8C%E5%8D%B7%E4%B8%80_r1p10.pdf
(Chinese only). The revision includes the following updates:

- estimated fp reciporcal instructions: frecip -> frecipe, frsqrt ->
  frsqrte
- 128-bit width store-conditional instruction: sc.q
- ll.w/d with acquire semantic: llacq.w/d, sc.w/d with release semantic:
  screl.w/d
- compare and swap instructions: amcas[_db].b/w/h/d
- byte and word-wide amswap/add instructions: am{swap/add}[_db].{b/h}
- new definition for dbar hints
- clarify 32-bit division instruction hebavior
- clarify load ordering when accessing the same address
- introduce message signaled interrupt
- introduce hardware page table walker

The new revision is implemented in the to be released Loongson 3A6000
processor.

This patch series implements the new instructions except sc.q, because I
do not know how to match a pair of ll.d to sc.q.


Jiajie Chen (5):
  include/exec/memop.h: Add MO_TESB
  target/loongarch: Add am{swap/add}[_db].{b/h}
  target/loongarch: Add amcas[_db].{b/h/w/d}
  target/loongarch: Add estimated reciprocal instructions
  target/loongarch: Add llacq/screl instructions

 include/exec/memop.h  |  1 +
 target/loongarch/cpu.h|  4 ++
 target/loongarch/disas.c  | 32 
 .../loongarch/insn_trans/trans_atomic.c.inc   | 52 +++
 .../loongarch/insn_trans/trans_farith.c.inc   |  4 ++
 target/loongarch/insn_trans/trans_vec.c.inc   |  8 +++
 target/loongarch/insns.decode | 32 
 target/loongarch/translate.h  | 27 +++---
 8 files changed, 152 insertions(+), 8 deletions(-)

-- 
2.42.0

[PATCH 1/5] include/exec/memop.h: Add MO_TESB

2023-10-23 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
---
 include/exec/memop.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/exec/memop.h b/include/exec/memop.h
index a86dc6743a..834327c62d 100644
--- a/include/exec/memop.h
+++ b/include/exec/memop.h
@@ -140,6 +140,7 @@ typedef enum MemOp {
 MO_TEUL  = MO_TE | MO_UL,
 MO_TEUQ  = MO_TE | MO_UQ,
 MO_TEUO  = MO_TE | MO_UO,
+MO_TESB  = MO_TE | MO_SB,
 MO_TESW  = MO_TE | MO_SW,
 MO_TESL  = MO_TE | MO_SL,
 MO_TESQ  = MO_TE | MO_SQ,
-- 
2.42.0

[PATCH 2/5] target/loongarch: Add am{swap/add}[_db].{b/h}

2023-10-23 Thread Jiajie Chen

The new instructions are introduced in LoongArch v1.1:

- amswap.b
- amswap.h
- amadd.b
- amadd.h
- amswap_db.b
- amswap_db.h
- amadd_db.b
- amadd_db.h

The instructions are gated by CPUCFG2.LAM_BH.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/cpu.h |  1 +
 target/loongarch/disas.c   |  8 
 target/loongarch/insn_trans/trans_atomic.c.inc |  8 
 target/loongarch/insns.decode  |  8 
 target/loongarch/translate.h   | 17 +
 5 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 8b54cf109c..7166c07756 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -155,6 +155,7 @@ FIELD(CPUCFG2, LBT_ARM, 19, 1)
 FIELD(CPUCFG2, LBT_MIPS, 20, 1)
 FIELD(CPUCFG2, LSPW, 21, 1)
 FIELD(CPUCFG2, LAM, 22, 1)
+FIELD(CPUCFG2, LAM_BH, 27, 1)
 
 /* cpucfg[3] bits */
 FIELD(CPUCFG3, CCDMA, 0, 1)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 2040f3e44d..d33aa8173a 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -575,6 +575,14 @@ INSN(fldx_s,   frr)
 INSN(fldx_d,   frr)
 INSN(fstx_s,   frr)
 INSN(fstx_d,   frr)
+INSN(amswap_b, rrr)
+INSN(amswap_h, rrr)
+INSN(amadd_b,  rrr)
+INSN(amadd_h,  rrr)
+INSN(amswap_db_b,  rrr)
+INSN(amswap_db_h,  rrr)
+INSN(amadd_db_b,   rrr)
+INSN(amadd_db_h,   rrr)
 INSN(amswap_w, rrr)
 INSN(amswap_d, rrr)
 INSN(amadd_w,  rrr)
diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc 
b/target/loongarch/insn_trans/trans_atomic.c.inc
index 80c2e286fd..cd28e217ad 100644
--- a/target/loongarch/insn_trans/trans_atomic.c.inc
+++ b/target/loongarch/insn_trans/trans_atomic.c.inc
@@ -73,6 +73,14 @@ TRANS(ll_w, ALL, gen_ll, MO_TESL)
 TRANS(sc_w, ALL, gen_sc, MO_TESL)
 TRANS(ll_d, 64, gen_ll, MO_TEUQ)
 TRANS(sc_d, 64, gen_sc, MO_TEUQ)
+TRANS(amswap_b, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESB)
+TRANS(amswap_h, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESW)
+TRANS(amadd_b, LAM_BH, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESB)
+TRANS(amadd_h, LAM_BH, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESW)
+TRANS(amswap_db_b, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESB)
+TRANS(amswap_db_h, LAM_BH, gen_am, tcg_gen_atomic_xchg_tl, MO_TESW)
+TRANS(amadd_db_b, LAM_BH, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESB)
+TRANS(amadd_db_h, LAM_BH, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESW)
 TRANS(amswap_w, LAM, gen_am, tcg_gen_atomic_xchg_tl, MO_TESL)
 TRANS(amswap_d, LAM, gen_am, tcg_gen_atomic_xchg_tl, MO_TEUQ)
 TRANS(amadd_w, LAM, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESL)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 62f58cc541..678ce42038 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -261,6 +261,14 @@ ll_w0010  .. . . 
@rr_i14s2
 sc_w0010 0001 .. . . @rr_i14s2
 ll_d0010 0010 .. . . @rr_i14s2
 sc_d0010 0011 .. . . @rr_i14s2
+amswap_b0011 1101 11000 . . .@rrr
+amswap_h0011 1101 11001 . . .@rrr
+amadd_b 0011 1101 11010 . . .@rrr
+amadd_h 0011 1101 11011 . . .@rrr
+amswap_db_b 0011 1101 11100 . . .@rrr
+amswap_db_h 0011 1101 11101 . . .@rrr
+amadd_db_b  0011 1101 0 . . .@rrr
+amadd_db_h  0011 1101 1 . . .@rrr
 amswap_w0011 1110 0 . . .@rrr
 amswap_d0011 1110 1 . . .@rrr
 amadd_w 0011 1110 00010 . . .@rrr
diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index 195f53573a..0b230530e7 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -17,14 +17,15 @@
 #define avail_ALL(C)   true
 #define avail_64(C)(FIELD_EX32((C)->cpucfg1, CPUCFG1, ARCH) == \
 CPUCFG1_ARCH_LA64)
-#define avail_FP(C)(FIELD_EX32((C)->cpucfg2, CPUCFG2, FP))
-#define avail_FP_SP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_SP))
-#define avail_FP_DP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_DP))
-#define avail_LSPW(C)  (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSPW))
-#define avail_LAM(C)   (FIELD_EX32((C)->cpucfg2, CPUCFG2, LAM))
-#define avail_LSX(C)   (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSX))
-#define avail_LASX(C)  (FIELD_EX32((C)->cpucfg2, CPUCFG2, LASX))
-#define avail_IOCSR(C) (FIELD_EX32((C)->cpucfg1, CPUCFG1, IOCSR))
+#define avail_FP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP))
+#define avail_FP_SP(C)  (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_SP))
+#define avail_FP_DP(C)  (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_DP))
+#define avail_LSPW(C)   (FI

[PATCH 5/5] target/loongarch: Add llacq/screl instructions

2023-10-23 Thread Jiajie Chen

Add the following instructions in LoongArch v1.1:

- llacq.w
- screl.w
- llacq.d
- screl.d

They are guarded by CPUCFG2.LLACQ_SCREL.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/cpu.h|  1 +
 target/loongarch/disas.c  |  4 
 .../loongarch/insn_trans/trans_atomic.c.inc   | 20 +++
 target/loongarch/insns.decode |  4 
 target/loongarch/translate.h  |  3 +++
 5 files changed, 32 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 8f938effa8..f0a63d5484 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -158,6 +158,7 @@ FIELD(CPUCFG2, LAM, 22, 1)
 FIELD(CPUCFG2, FRECIPE, 25, 1)
 FIELD(CPUCFG2, LAM_BH, 27, 1)
 FIELD(CPUCFG2, LAMCAS, 28, 1)
+FIELD(CPUCFG2, LLACQ_SCREL, 29, 1)
 
 /* cpucfg[3] bits */
 FIELD(CPUCFG3, CCDMA, 0, 1)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 9eb49fb5e3..8e02f51ddc 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -579,6 +579,10 @@ INSN(fldx_s,   frr)
 INSN(fldx_d,   frr)
 INSN(fstx_s,   frr)
 INSN(fstx_d,   frr)
+INSN(llacq_w,  rr)
+INSN(screl_w,  rr)
+INSN(llacq_d,  rr)
+INSN(screl_d,  rr)
 INSN(amcas_b,  rrr)
 INSN(amcas_h,  rrr)
 INSN(amcas_w,  rrr)
diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc 
b/target/loongarch/insn_trans/trans_atomic.c.inc
index bea567fdaf..0c81fbd745 100644
--- a/target/loongarch/insn_trans/trans_atomic.c.inc
+++ b/target/loongarch/insn_trans/trans_atomic.c.inc
@@ -17,6 +17,14 @@ static bool gen_ll(DisasContext *ctx, arg_rr_i *a, MemOp mop)
 return true;
 }
 
+static bool gen_llacq(DisasContext *ctx, arg_rr *a, MemOp mop)
+{
+arg_rr_i tmp_a = {
+.rd = a->rd, .rj = a->rj, .imm = 0
+};
+return gen_ll(ctx, _a, mop);
+}
+
 static bool gen_sc(DisasContext *ctx, arg_rr_i *a, MemOp mop)
 {
 TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
@@ -45,6 +53,14 @@ static bool gen_sc(DisasContext *ctx, arg_rr_i *a, MemOp mop)
 return true;
 }
 
+static bool gen_screl(DisasContext *ctx, arg_rr *a, MemOp mop)
+{
+arg_rr_i tmp_a = {
+.rd = a->rd, .rj = a->rj, .imm = 0
+};
+return gen_sc(ctx, _a, mop);
+}
+
 static bool gen_cas(DisasContext *ctx, arg_rrr *a,
 void (*func)(TCGv, TCGv, TCGv, TCGv, TCGArg, MemOp),
 MemOp mop)
@@ -89,6 +105,10 @@ TRANS(ll_w, ALL, gen_ll, MO_TESL)
 TRANS(sc_w, ALL, gen_sc, MO_TESL)
 TRANS(ll_d, 64, gen_ll, MO_TEUQ)
 TRANS(sc_d, 64, gen_sc, MO_TEUQ)
+TRANS(llacq_w, LLACQ_SCREL, gen_llacq, MO_TESL)
+TRANS(screl_w, LLACQ_SCREL, gen_screl, MO_TESL)
+TRANS(llacq_d, LLACQ_SCREL_64, gen_llacq, MO_TEUQ)
+TRANS(screl_d, LLACQ_SCREL_64, gen_screl, MO_TEUQ)
 TRANS(amcas_b, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESB)
 TRANS(amcas_h, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESW)
 TRANS(amcas_w, LAMCAS, gen_cas, tcg_gen_atomic_cmpxchg_tl, MO_TESL)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 92078f0f9f..e056d492d3 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -261,6 +261,10 @@ ll_w0010  .. . . 
@rr_i14s2
 sc_w0010 0001 .. . . @rr_i14s2
 ll_d0010 0010 .. . . @rr_i14s2
 sc_d0010 0011 .. . . @rr_i14s2
+llacq_w 0011 1101 0 0 . .@rr
+screl_w 0011 1101 0 1 . .@rr
+llacq_d 0011 1101 0 00010 . .@rr
+screl_d 0011 1101 0 00011 . .@rr
 amcas_b 0011 1101 1 . . .@rrr
 amcas_h 0011 1101 10001 . . .@rrr
 amcas_w 0011 1101 10010 . . .@rrr
diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index 651c5796ca..3d13d40ca6 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -34,6 +34,9 @@
 #define avail_FRECIPE_LSX(C)   (avail_FRECIPE(C) && avail_LSX(C))
 #define avail_FRECIPE_LASX(C)   (avail_FRECIPE(C) && avail_LASX(C))
 
+#define avail_LLACQ_SCREL(C)(FIELD_EX32((C)->cpucfg2, CPUCFG2, 
LLACQ_SCREL))
+#define avail_LLACQ_SCREL_64(C) (avail_64(C) && avail_LLACQ_SCREL(C))
+
 /*
  * If an operation is being performed on less than TARGET_LONG_BITS,
  * it may require the inputs to be sign- or zero-extended; which will
-- 
2.42.0

[PATCH] linux-user/elfload: Enable LSX/LASX in HWCAP for LoongArch

2023-10-01 Thread Jiajie Chen

Since support for LSX and LASX is landed in QEMU recently, we can update
HWCAPS accordingly.

Signed-off-by: Jiajie Chen 
---
 linux-user/elfload.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index db75cd4b33..f11f25309e 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1237,6 +1237,14 @@ static uint32_t get_elf_hwcap(void)
 hwcaps |= HWCAP_LOONGARCH_LAM;
 }
 
+if (FIELD_EX32(cpu->env.cpucfg[2], CPUCFG2, LSX)) {
+hwcaps |= HWCAP_LOONGARCH_LSX;
+}
+
+if (FIELD_EX32(cpu->env.cpucfg[2], CPUCFG2, LASX)) {
+hwcaps |= HWCAP_LOONGARCH_LASX;
+}
+
 return hwcaps;
 }
 
-- 
2.41.0

Re: [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h

2023-09-30 Thread Jiajie Chen




On 2023/9/17 06:01, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  tcg/loongarch64/tcg-target.h | 8 
  tcg/loongarch64/tcg-target.c.inc | 8 +---
  2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 03017672f6..1bea15b02e 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -29,6 +29,8 @@
  #ifndef LOONGARCH_TCG_TARGET_H
  #define LOONGARCH_TCG_TARGET_H
  
+#include "host/cpuinfo.h"

+
  #define TCG_TARGET_INSN_UNIT_SIZE 4
  #define TCG_TARGET_NB_REGS 64
  
@@ -85,8 +87,6 @@ typedef enum {

  TCG_VEC_TMP0 = TCG_REG_V23,
  } TCGReg;
  
-extern bool use_lsx_instructions;

-
  /* used for function call generation */
  #define TCG_REG_CALL_STACK  TCG_REG_SP
  #define TCG_TARGET_STACK_ALIGN  16
@@ -171,10 +171,10 @@ extern bool use_lsx_instructions;
  #define TCG_TARGET_HAS_muluh_i641
  #define TCG_TARGET_HAS_mulsh_i641
  
-#define TCG_TARGET_HAS_qemu_ldst_i128   use_lsx_instructions

+#define TCG_TARGET_HAS_qemu_ldst_i128   (cpuinfo & CPUINFO_LSX)
  
  #define TCG_TARGET_HAS_v64  0

-#define TCG_TARGET_HAS_v128 use_lsx_instructions
+#define TCG_TARGET_HAS_v128 (cpuinfo & CPUINFO_LSX)
  #define TCG_TARGET_HAS_v256 0
  
  #define TCG_TARGET_HAS_not_vec  1

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 40074c46b8..52f2c26ce1 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -32,8 +32,6 @@
  #include "../tcg-ldst.c.inc"
  #include 
  
-bool use_lsx_instructions;

-
  #ifdef CONFIG_DEBUG_TCG
  static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
  "zero",
@@ -2316,10 +2314,6 @@ static void tcg_target_init(TCGContext *s)
  exit(EXIT_FAILURE);
  }
  
-if (hwcap & HWCAP_LOONGARCH_LSX) {

-use_lsx_instructions = 1;
-}
-
  tcg_target_available_regs[TCG_TYPE_I32] = ALL_GENERAL_REGS;
  tcg_target_available_regs[TCG_TYPE_I64] = ALL_GENERAL_REGS;
  
@@ -2335,7 +2329,7 @@ static void tcg_target_init(TCGContext *s)

  tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S8);
  tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S9);
  
-if (use_lsx_instructions) {

+if (cpuinfo & CPUINFO_LSX) {
  tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
  tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V24);
  tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V25);



Reviewed-by: Jiajie Chen

Re: [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128

2023-09-30 Thread Jiajie Chen




On 2023/9/17 06:01, Richard Henderson wrote:

Use new registers for the output, so that we never overlap
the input address, which could happen for user-only.
This avoids a "tmp = addr + 0" in that case.

Signed-off-by: Richard Henderson 
---
  tcg/loongarch64/tcg-target-con-set.h |  2 +-
  tcg/loongarch64/tcg-target.c.inc | 17 +++--
  2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 77d62e38e7..cae6c2aad6 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -38,4 +38,4 @@ C_O1_I2(w, w, wM)
  C_O1_I2(w, w, wA)
  C_O1_I3(w, w, w, w)
  C_O1_I4(r, rZ, rJ, rZ, rZ)
-C_O2_I1(r, r, r)
+C_N2_I1(r, r, r)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index b701df50db..40074c46b8 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1105,13 +1105,18 @@ static void tcg_out_qemu_ldst_i128(TCGContext *s, 
TCGReg data_lo, TCGReg data_hi
  }
  } else {
  /* Otherwise use a pair of LD/ST. */
-tcg_out_opc_add_d(s, TCG_REG_TMP0, h.base, h.index);
+TCGReg base = h.base;
+if (h.index != TCG_REG_ZERO) {
+base = TCG_REG_TMP0;
+tcg_out_opc_add_d(s, base, h.base, h.index);
+}
  if (is_ld) {
-tcg_out_opc_ld_d(s, data_lo, TCG_REG_TMP0, 0);
-tcg_out_opc_ld_d(s, data_hi, TCG_REG_TMP0, 8);
+tcg_debug_assert(base != data_lo);
+tcg_out_opc_ld_d(s, data_lo, base, 0);
+tcg_out_opc_ld_d(s, data_hi, base, 8);
  } else {
-tcg_out_opc_st_d(s, data_lo, TCG_REG_TMP0, 0);
-tcg_out_opc_st_d(s, data_hi, TCG_REG_TMP0, 8);
+tcg_out_opc_st_d(s, data_lo, base, 0);
+tcg_out_opc_st_d(s, data_hi, base, 8);
  }
  }
  
@@ -2049,7 +2054,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
  
  case INDEX_op_qemu_ld_a32_i128:

  case INDEX_op_qemu_ld_a64_i128:
-return C_O2_I1(r, r, r);
+return C_N2_I1(r, r, r);
  
  case INDEX_op_qemu_st_a32_i128:

  case INDEX_op_qemu_st_a64_i128:



Reviewed-by: Jiajie Chen

Re: [PATCH 3/7] util: Add cpuinfo for loongarch64

2023-09-30 Thread Jiajie Chen




On 2023/9/17 06:01, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  host/include/loongarch64/host/cpuinfo.h | 21 +++
  util/cpuinfo-loongarch.c| 35 +
  util/meson.build|  2 ++
  3 files changed, 58 insertions(+)
  create mode 100644 host/include/loongarch64/host/cpuinfo.h
  create mode 100644 util/cpuinfo-loongarch.c

diff --git a/host/include/loongarch64/host/cpuinfo.h 
b/host/include/loongarch64/host/cpuinfo.h
new file mode 100644
index 00..fab664a10b
--- /dev/null
+++ b/host/include/loongarch64/host/cpuinfo.h
@@ -0,0 +1,21 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Host specific cpu identification for LoongArch
+ */
+
+#ifndef HOST_CPUINFO_H
+#define HOST_CPUINFO_H
+
+#define CPUINFO_ALWAYS  (1u << 0)  /* so cpuinfo is nonzero */
+#define CPUINFO_LSX (1u << 1)
+
+/* Initialized with a constructor. */
+extern unsigned cpuinfo;
+
+/*
+ * We cannot rely on constructor ordering, so other constructors must
+ * use the function interface rather than the variable above.
+ */
+unsigned cpuinfo_init(void);
+
+#endif /* HOST_CPUINFO_H */
diff --git a/util/cpuinfo-loongarch.c b/util/cpuinfo-loongarch.c
new file mode 100644
index 00..08b6d7460c
--- /dev/null
+++ b/util/cpuinfo-loongarch.c
@@ -0,0 +1,35 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Host specific cpu identification for LoongArch.
+ */
+
+#include "qemu/osdep.h"
+#include "host/cpuinfo.h"
+
+#ifdef CONFIG_GETAUXVAL
+# include 
+#else
+# include "elf.h"
+#endif
+#include 
+
+unsigned cpuinfo;
+
+/* Called both as constructor and (possibly) via other constructors. */
+unsigned __attribute__((constructor)) cpuinfo_init(void)
+{
+unsigned info = cpuinfo;
+unsigned long hwcap;
+
+if (info) {
+return info;
+}
+
+hwcap = qemu_getauxval(AT_HWCAP);
+
+info = CPUINFO_ALWAYS;
+info |= (hwcap & HWCAP_LOONGARCH_LSX ? CPUINFO_LSX : 0);
+
+cpuinfo = info;
+return info;
+}
diff --git a/util/meson.build b/util/meson.build
index c4827fd70a..b136f02aa0 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -112,6 +112,8 @@ if cpu == 'aarch64'
util_ss.add(files('cpuinfo-aarch64.c'))
  elif cpu in ['x86', 'x86_64']
util_ss.add(files('cpuinfo-i386.c'))
+elif cpu == 'loongarch64'
+  util_ss.add(files('cpuinfo-loongarch.c'))
  elif cpu in ['ppc', 'ppc64']
util_ss.add(files('cpuinfo-ppc.c'))
  endif



Reviewed-by: Jiajie Chen

Re: [PATCH 1/7] tcg: Add C_N2_I1

2023-09-30 Thread Jiajie Chen


On 2023/9/17 06:01, Richard Henderson wrote:

Constraint with two outputs, both in new registers.

Signed-off-by: Richard Henderson 
---
  tcg/tcg.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 604fa9bf3e..fdbf79689a 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -644,6 +644,7 @@ static void tcg_out_movext3(TCGContext *s, const 
TCGMovExtend *i1,
  #define C_O1_I4(O1, I1, I2, I3, I4) C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4),
  
  #define C_N1_I2(O1, I1, I2) C_PFX3(c_n1_i2_, O1, I1, I2),

+#define C_N2_I1(O1, O2, I1) C_PFX3(c_n2_i1_, O1, O2, I1),
  
  #define C_O2_I1(O1, O2, I1) C_PFX3(c_o2_i1_, O1, O2, I1),

  #define C_O2_I2(O1, O2, I1, I2) C_PFX4(c_o2_i2_, O1, O2, I1, I2),
@@ -666,6 +667,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
  #undef C_O1_I3
  #undef C_O1_I4
  #undef C_N1_I2
+#undef C_N2_I1
  #undef C_O2_I1
  #undef C_O2_I2
  #undef C_O2_I3
@@ -685,6 +687,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
  #define C_O1_I4(O1, I1, I2, I3, I4) { .args_ct_str = { #O1, #I1, #I2, 
#I3, #I4 } },
  
  #define C_N1_I2(O1, I1, I2) { .args_ct_str = { "&" #O1, #I1, #I2 } },

+#define C_N2_I1(O1, O2, I1) { .args_ct_str = { "&" #O1, "&" #O2, 
#I1 } },
  
  #define C_O2_I1(O1, O2, I1) { .args_ct_str = { #O1, #O2, #I1 } },

  #define C_O2_I2(O1, O2, I1, I2) { .args_ct_str = { #O1, #O2, #I1, #I2 
} },
@@ -706,6 +709,7 @@ static const TCGTargetOpDef constraint_sets[] = {
  #undef C_O1_I3
  #undef C_O1_I4
  #undef C_N1_I2
+#undef C_N2_I1
  #undef C_O2_I1
  #undef C_O2_I2
  #undef C_O2_I3
@@ -725,6 +729,7 @@ static const TCGTargetOpDef constraint_sets[] = {
  #define C_O1_I4(O1, I1, I2, I3, I4) C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4)
  
  #define C_N1_I2(O1, I1, I2) C_PFX3(c_n1_i2_, O1, I1, I2)

+#define C_N2_I1(O1, O2, I1) C_PFX3(c_n2_i1_, O1, O2, I1)
  
  #define C_O2_I1(O1, O2, I1) C_PFX3(c_o2_i1_, O1, O2, I1)

  #define C_O2_I2(O1, O2, I1, I2) C_PFX4(c_o2_i2_, O1, O2, I1, I2)



Reviewed-by: Jiajie Chen

[PATCH] target/loongarch: fix ASXE flag conflict

2023-09-30 Thread Jiajie Chen

HW_FLAGS_EUEN_ASXE acccidentally conflicts with HW_FLAGS_CRMD_PG,
enabling LASX instructions even when CSR_EUEN.ASXE=0.

Closes: https://gitlab.com/qemu-project/qemu/-/issues/1907
Signed-off-by: Jiajie Chen 
---
 target/loongarch/cpu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index f125a8e49b..79ad79a289 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -462,7 +462,7 @@ static inline void set_pc(CPULoongArchState *env, uint64_t 
value)
 #define HW_FLAGS_CRMD_PGR_CSR_CRMD_PG_MASK   /* 0x10 */
 #define HW_FLAGS_EUEN_FPE   0x04
 #define HW_FLAGS_EUEN_SXE   0x08
-#define HW_FLAGS_EUEN_ASXE  0x10
+#define HW_FLAGS_EUEN_ASXE  0x40
 #define HW_FLAGS_VA32   0x20
 
 static inline void cpu_get_tb_cpu_state(CPULoongArchState *env, vaddr *pc,
-- 
2.41.0

[PATCH v4 02/16] tcg/loongarch64: Lower basic tcg vec ops to LSX

2023-09-07 Thread Jiajie Chen

LSX support on host cpu is detected via hwcap.

Lower the following ops to LSX:

- dup_vec
- dupi_vec
- dupm_vec
- ld_vec
- st_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target-con-set.h |   2 +
 tcg/loongarch64/tcg-target-con-str.h |   1 +
 tcg/loongarch64/tcg-target.c.inc | 219 ++-
 tcg/loongarch64/tcg-target.h |  38 -
 tcg/loongarch64/tcg-target.opc.h |  12 ++
 5 files changed, 270 insertions(+), 2 deletions(-)
 create mode 100644 tcg/loongarch64/tcg-target.opc.h

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index c2bde44613..37b3f80bf9 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -17,7 +17,9 @@
 C_O0_I1(r)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
+C_O0_I2(w, r)
 C_O1_I1(r, r)
+C_O1_I1(w, r)
 C_O1_I2(r, r, rC)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
diff --git a/tcg/loongarch64/tcg-target-con-str.h 
b/tcg/loongarch64/tcg-target-con-str.h
index 6e9ccca3ad..81b8d40278 100644
--- a/tcg/loongarch64/tcg-target-con-str.h
+++ b/tcg/loongarch64/tcg-target-con-str.h
@@ -14,6 +14,7 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
+REGS('w', ALL_VECTOR_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index baf5fc3819..150278e112 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -32,6 +32,8 @@
 #include "../tcg-ldst.c.inc"
 #include 
 
+bool use_lsx_instructions;
+
 #ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "zero",
@@ -65,7 +67,39 @@ static const char * const 
tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "s5",
 "s6",
 "s7",
-"s8"
+"s8",
+"vr0",
+"vr1",
+"vr2",
+"vr3",
+"vr4",
+"vr5",
+"vr6",
+"vr7",
+"vr8",
+"vr9",
+"vr10",
+"vr11",
+"vr12",
+"vr13",
+"vr14",
+"vr15",
+"vr16",
+"vr17",
+"vr18",
+"vr19",
+"vr20",
+"vr21",
+"vr22",
+"vr23",
+"vr24",
+"vr25",
+"vr26",
+"vr27",
+"vr28",
+"vr29",
+"vr30",
+"vr31",
 };
 #endif
 
@@ -102,6 +136,15 @@ static const int tcg_target_reg_alloc_order[] = {
 TCG_REG_A2,
 TCG_REG_A1,
 TCG_REG_A0,
+
+/* Vector registers */
+TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3,
+TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7,
+TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11,
+TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
+TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
+/* V24 - V31 are caller-saved, and skipped.  */
 };
 
 static const int tcg_target_call_iarg_regs[] = {
@@ -135,6 +178,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define TCG_CT_CONST_WSZ   0x2000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
+#define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32)
 
 static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len)
 {
@@ -1486,6 +1530,154 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 }
 }
 
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+TCGReg rd, TCGReg rs)
+{
+switch (vece) {
+case MO_8:
+tcg_out_opc_vreplgr2vr_b(s, rd, rs);
+break;
+case MO_16:
+tcg_out_opc_vreplgr2vr_h(s, rd, rs);
+break;
+case MO_32:
+tcg_out_opc_vreplgr2vr_w(s, rd, rs);
+break;
+case MO_64:
+tcg_out_opc_vreplgr2vr_d(s, rd, rs);
+break;
+default:
+g_assert_not_reached();
+}
+return true;
+}
+
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+ TCGReg r, TCGReg base, intptr_t offset)
+{
+/* Handle imm overflow and division (vldrepl.d imm is divided by 8) */
+if (offset < -0x800 || offset > 0x7ff || \
+(offset & ((1 << vece) - 1)) != 0) {
+tcg_out_addi(s, TCG_TYPE_I64, TCG_REG_TMP0, base, offset);
+base = TCG_REG_TMP0;
+offset = 0;
+}
+offset >>= vece;
+
+switch (vece) {
+case MO_8:
+tcg_out_opc_vldrepl_b(s, r, base, offset);
+break;
+case MO_16:
+tcg_out_opc_vldrepl_h(s, r, base, offset);
+break;
+case MO_32:
+tcg_out_opc_vldrepl_w(s, r, base, offset);
+break;
+case

[PATCH v4 10/16] tcg/loongarch64: Lower vector saturated ops

2023-09-07 Thread Jiajie Chen

Lower the following ops:

- ssadd_vec
- usadd_vec
- sssub_vec
- ussub_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 32 
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index bdf22d8807..90c52c38cf 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1713,6 +1713,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn umax_vec_insn[4] = {
 OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU
 };
+static const LoongArchInsn ssadd_vec_insn[4] = {
+OPC_VSADD_B, OPC_VSADD_H, OPC_VSADD_W, OPC_VSADD_D
+};
+static const LoongArchInsn usadd_vec_insn[4] = {
+OPC_VSADD_BU, OPC_VSADD_HU, OPC_VSADD_WU, OPC_VSADD_DU
+};
+static const LoongArchInsn sssub_vec_insn[4] = {
+OPC_VSSUB_B, OPC_VSSUB_H, OPC_VSSUB_W, OPC_VSSUB_D
+};
+static const LoongArchInsn ussub_vec_insn[4] = {
+OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1829,6 +1841,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_umax_vec:
 tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_ssadd_vec:
+tcg_out32(s, encode_vdvjvk_insn(ssadd_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_usadd_vec:
+tcg_out32(s, encode_vdvjvk_insn(usadd_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sssub_vec:
+tcg_out32(s, encode_vdvjvk_insn(sssub_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_ussub_vec:
+tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1860,6 +1884,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_smax_vec:
 case INDEX_op_umin_vec:
 case INDEX_op_umax_vec:
+case INDEX_op_ssadd_vec:
+case INDEX_op_usadd_vec:
+case INDEX_op_sssub_vec:
+case INDEX_op_ussub_vec:
 return 1;
 default:
 return 0;
@@ -2039,6 +2067,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_smax_vec:
 case INDEX_op_umin_vec:
 case INDEX_op_umax_vec:
+case INDEX_op_ssadd_vec:
+case INDEX_op_usadd_vec:
+case INDEX_op_sssub_vec:
+case INDEX_op_ussub_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index ec725aaeaa..fa14558275 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -192,7 +192,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_roti_vec 0
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
-#define TCG_TARGET_HAS_sat_vec  0
+#define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   0
 #define TCG_TARGET_HAS_cmpsel_vec   0
-- 
2.42.0

[PATCH v4 12/16] tcg/loongarch64: Lower bitsel_vec to vbitsel

2023-09-07 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target-con-set.h |  1 +
 tcg/loongarch64/tcg-target.c.inc | 11 ++-
 tcg/loongarch64/tcg-target.h |  2 +-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 3f530ad4d8..914572d21b 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -35,4 +35,5 @@ C_O1_I2(r, rZ, rZ)
 C_O1_I2(w, w, w)
 C_O1_I2(w, w, wM)
 C_O1_I2(w, w, wA)
+C_O1_I3(w, w, w, w)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 6958fd219c..a33ec594ee 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1676,7 +1676,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
const int const_args[TCG_MAX_OP_ARGS])
 {
 TCGType type = vecl + TCG_TYPE_V64;
-TCGArg a0, a1, a2;
+TCGArg a0, a1, a2, a3;
 TCGReg temp = TCG_REG_TMP0;
 TCGReg temp_vec = TCG_VEC_TMP0;
 
@@ -1738,6 +1738,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 a0 = args[0];
 a1 = args[1];
 a2 = args[2];
+a3 = args[3];
 
 /* Currently only supports V128 */
 tcg_debug_assert(type == TCG_TYPE_V128);
@@ -1871,6 +1872,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sarv_vec:
 tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_bitsel_vec:
+/* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
+tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1909,6 +1914,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_shlv_vec:
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
+case INDEX_op_bitsel_vec:
 return 1;
 default:
 return 0;
@@ -2101,6 +2107,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_neg_vec:
 return C_O1_I1(w, w);
 
+case INDEX_op_bitsel_vec:
+return C_O1_I3(w, w, w, w);
+
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 7e9fb61c47..bc56939a57 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -194,7 +194,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_rotv_vec 0
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
-#define TCG_TARGET_HAS_bitsel_vec   0
+#define TCG_TARGET_HAS_bitsel_vec   1
 #define TCG_TARGET_HAS_cmpsel_vec   0
 
 #define TCG_TARGET_DEFAULT_MO (0)
-- 
2.42.0

[PATCH v4 03/16] tcg: pass vece to tcg_target_const_match()

2023-09-07 Thread Jiajie Chen

Pass vece to tcg_target_const_match() to allow correct interpretation of
const args of vector ops.

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.c.inc | 2 +-
 tcg/arm/tcg-target.c.inc | 2 +-
 tcg/i386/tcg-target.c.inc| 2 +-
 tcg/loongarch64/tcg-target.c.inc | 2 +-
 tcg/mips/tcg-target.c.inc| 2 +-
 tcg/ppc/tcg-target.c.inc | 2 +-
 tcg/riscv/tcg-target.c.inc   | 2 +-
 tcg/s390x/tcg-target.c.inc   | 2 +-
 tcg/sparc64/tcg-target.c.inc | 2 +-
 tcg/tcg.c| 4 ++--
 tcg/tci/tcg-target.c.inc | 2 +-
 11 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 0931a69448..a1e2b6be16 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -272,7 +272,7 @@ static bool is_shimm1632(uint32_t v32, int *cmode, int 
*imm8)
 }
 }
 
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index acb5f23b54..76f1345002 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -509,7 +509,7 @@ static bool is_shimm1632(uint32_t v32, int *cmode, int 
*imm8)
  * mov operand2: values represented with x << (2 * y), x < 0x100
  * add, sub, eor...: ditto
  */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 0c3d1e4cef..aed91e515e 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -198,7 +198,7 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 }
 
 /* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 150278e112..07a0326e5d 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -186,7 +186,7 @@ static inline tcg_target_long sextreg(tcg_target_long val, 
int pos, int len)
 }
 
 /* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return true;
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 9faa8bdf0b..c6662889f0 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -190,7 +190,7 @@ static bool is_p2m1(tcg_target_long val)
 }
 
 /* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 090f11e71c..ccf245191d 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -261,7 +261,7 @@ static bool reloc_pc14(tcg_insn_unit *src_rw, const 
tcg_insn_unit *target)
 }
 
 /* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 9be81c1b7b..3bd7959e7e 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -145,7 +145,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define sextreg  sextract64
 
 /* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index ecd8aaf2a1..f4d3abcb71 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -540,7 +540,7 @@ static bool risbg_mask(uint64_t c)
 }
 
 /* Test if a constant matches the constraint. */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc
index 81a08bb6c5..6b9be4c520 100644
--- a/tcg/sparc64/tcg-target.c.inc
+++ b/tcg/

[PATCH v4 15/16] tcg/loongarch64: Lower rotli_vec to vrotri

2023-09-07 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 21 +
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 8f448823b0..82901d678a 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1902,6 +1902,26 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1,
 temp_vec));
 break;
+case INDEX_op_rotli_vec:
+/* rotli_vec a1, a2 = rotri_vec a1, -a2 */
+a2 = extract32(-a2, 0, 3 + vece);
+switch (vece) {
+case MO_8:
+tcg_out_opc_vrotri_b(s, a0, a1, a2);
+break;
+case MO_16:
+tcg_out_opc_vrotri_h(s, a0, a1, a2);
+break;
+case MO_32:
+tcg_out_opc_vrotri_w(s, a0, a1, a2);
+break;
+case MO_64:
+tcg_out_opc_vrotri_d(s, a0, a1, a2);
+break;
+default:
+g_assert_not_reached();
+}
+break;
 case INDEX_op_bitsel_vec:
 /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
 tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
@@ -2140,6 +2160,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_shli_vec:
 case INDEX_op_shri_vec:
 case INDEX_op_sari_vec:
+case INDEX_op_rotli_vec:
 return C_O1_I1(w, w);
 
 case INDEX_op_bitsel_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index d5c69bc192..67b0a95532 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -189,7 +189,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  1
-#define TCG_TARGET_HAS_roti_vec 0
+#define TCG_TARGET_HAS_roti_vec 1
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 1
 #define TCG_TARGET_HAS_sat_vec  1
-- 
2.42.0

[PATCH v4 13/16] tcg/loongarch64: Lower vector shift integer ops

2023-09-07 Thread Jiajie Chen

Lower the following ops:

- shli_vec
- shrv_vec
- sarv_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 21 +
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index a33ec594ee..c21c917083 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1734,6 +1734,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn sarv_vec_insn[4] = {
 OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D
 };
+static const LoongArchInsn shli_vec_insn[4] = {
+OPC_VSLLI_B, OPC_VSLLI_H, OPC_VSLLI_W, OPC_VSLLI_D
+};
+static const LoongArchInsn shri_vec_insn[4] = {
+OPC_VSRLI_B, OPC_VSRLI_H, OPC_VSRLI_W, OPC_VSRLI_D
+};
+static const LoongArchInsn sari_vec_insn[4] = {
+OPC_VSRAI_B, OPC_VSRAI_H, OPC_VSRAI_W, OPC_VSRAI_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1872,6 +1881,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sarv_vec:
 tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_shli_vec:
+tcg_out32(s, encode_vdvjuk3_insn(shli_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_shri_vec:
+tcg_out32(s, encode_vdvjuk3_insn(shri_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sari_vec:
+tcg_out32(s, encode_vdvjuk3_insn(sari_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_bitsel_vec:
 /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
 tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
@@ -2105,6 +2123,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 
 case INDEX_op_not_vec:
 case INDEX_op_neg_vec:
+case INDEX_op_shli_vec:
+case INDEX_op_shri_vec:
+case INDEX_op_sari_vec:
 return C_O1_I1(w, w);
 
 case INDEX_op_bitsel_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index bc56939a57..d7b806e252 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -186,7 +186,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_nor_vec  1
 #define TCG_TARGET_HAS_eqv_vec  0
 #define TCG_TARGET_HAS_mul_vec  1
-#define TCG_TARGET_HAS_shi_vec  0
+#define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  1
 #define TCG_TARGET_HAS_roti_vec 0
-- 
2.42.0

[PATCH v4 04/16] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt

2023-09-07 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target-con-set.h |  1 +
 tcg/loongarch64/tcg-target-con-str.h |  1 +
 tcg/loongarch64/tcg-target.c.inc | 65 
 3 files changed, 67 insertions(+)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 37b3f80bf9..8c8ea5d919 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -31,4 +31,5 @@ C_O1_I2(r, 0, rZ)
 C_O1_I2(r, rZ, ri)
 C_O1_I2(r, rZ, rJ)
 C_O1_I2(r, rZ, rZ)
+C_O1_I2(w, w, wM)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target-con-str.h 
b/tcg/loongarch64/tcg-target-con-str.h
index 81b8d40278..a8a1c44014 100644
--- a/tcg/loongarch64/tcg-target-con-str.h
+++ b/tcg/loongarch64/tcg-target-con-str.h
@@ -26,3 +26,4 @@ CONST('U', TCG_CT_CONST_U12)
 CONST('Z', TCG_CT_CONST_ZERO)
 CONST('C', TCG_CT_CONST_C12)
 CONST('W', TCG_CT_CONST_WSZ)
+CONST('M', TCG_CT_CONST_VCMP)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 07a0326e5d..129dd92910 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -176,6 +176,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define TCG_CT_CONST_U12   0x800
 #define TCG_CT_CONST_C12   0x1000
 #define TCG_CT_CONST_WSZ   0x2000
+#define TCG_CT_CONST_VCMP  0x4000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 #define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32)
@@ -209,6 +210,10 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct, int vece)
 if ((ct & TCG_CT_CONST_WSZ) && val == (type == TCG_TYPE_I32 ? 32 : 64)) {
 return true;
 }
+int64_t vec_val = sextract64(val, 0, 8 << vece);
+if ((ct & TCG_CT_CONST_VCMP) && -0x10 <= vec_val && vec_val <= 0x1f) {
+return true;
+}
 return false;
 }
 
@@ -1624,6 +1629,23 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 TCGType type = vecl + TCG_TYPE_V64;
 TCGArg a0, a1, a2;
 TCGReg temp = TCG_REG_TMP0;
+TCGReg temp_vec = TCG_VEC_TMP0;
+
+static const LoongArchInsn cmp_vec_insn[16][4] = {
+[TCG_COND_EQ] = {OPC_VSEQ_B, OPC_VSEQ_H, OPC_VSEQ_W, OPC_VSEQ_D},
+[TCG_COND_LE] = {OPC_VSLE_B, OPC_VSLE_H, OPC_VSLE_W, OPC_VSLE_D},
+[TCG_COND_LEU] = {OPC_VSLE_BU, OPC_VSLE_HU, OPC_VSLE_WU, OPC_VSLE_DU},
+[TCG_COND_LT] = {OPC_VSLT_B, OPC_VSLT_H, OPC_VSLT_W, OPC_VSLT_D},
+[TCG_COND_LTU] = {OPC_VSLT_BU, OPC_VSLT_HU, OPC_VSLT_WU, OPC_VSLT_DU},
+};
+static const LoongArchInsn cmp_vec_imm_insn[16][4] = {
+[TCG_COND_EQ] = {OPC_VSEQI_B, OPC_VSEQI_H, OPC_VSEQI_W, OPC_VSEQI_D},
+[TCG_COND_LE] = {OPC_VSLEI_B, OPC_VSLEI_H, OPC_VSLEI_W, OPC_VSLEI_D},
+[TCG_COND_LEU] = {OPC_VSLEI_BU, OPC_VSLEI_HU, OPC_VSLEI_WU, 
OPC_VSLEI_DU},
+[TCG_COND_LT] = {OPC_VSLTI_B, OPC_VSLTI_H, OPC_VSLTI_W, OPC_VSLTI_D},
+[TCG_COND_LTU] = {OPC_VSLTI_BU, OPC_VSLTI_HU, OPC_VSLTI_WU, 
OPC_VSLTI_DU},
+};
+LoongArchInsn insn;
 
 a0 = args[0];
 a1 = args[1];
@@ -1651,6 +1673,45 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out_opc_vldx(s, a0, a1, temp);
 }
 break;
+case INDEX_op_cmp_vec:
+TCGCond cond = args[3];
+if (const_args[2]) {
+/*
+ * cmp_vec dest, src, value
+ * Try vseqi/vslei/vslti
+ */
+int64_t value = sextract64(a2, 0, 8 << vece);
+if ((cond == TCG_COND_EQ || cond == TCG_COND_LE || \
+ cond == TCG_COND_LT) && (-0x10 <= value && value <= 0x0f)) {
+tcg_out32(s, encode_vdvjsk5_insn(cmp_vec_imm_insn[cond][vece], 
\
+ a0, a1, value));
+break;
+} else if ((cond == TCG_COND_LEU || cond == TCG_COND_LTU) &&
+(0x00 <= value && value <= 0x1f)) {
+tcg_out32(s, encode_vdvjuk5_insn(cmp_vec_imm_insn[cond][vece], 
\
+ a0, a1, value));
+break;
+}
+
+/*
+ * Fallback to:
+ * dupi_vec temp, a2
+ * cmp_vec a0, a1, temp, cond
+ */
+tcg_out_dupi_vec(s, type, vece, temp_vec, a2);
+a2 = temp_vec;
+}
+
+insn = cmp_vec_insn[cond][vece];
+if (insn == 0) {
+TCGArg t;
+t = a1, a1 = a2, a2 = t;
+cond = tcg_swap_cond(cond);
+insn = cmp_vec_insn[cond][vece];
+tcg_debug_assert(insn != 0);
+}
+tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1666,6 +1727,7 @@ int tcg_can

[PATCH v4 08/16] tcg/loongarch64: Lower mul_vec to vmul

2023-09-07 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 8 
 tcg/loongarch64/tcg-target.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index b36b706e39..0814f62905 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1698,6 +1698,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn neg_vec_insn[4] = {
 OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D
 };
+static const LoongArchInsn mul_vec_insn[4] = {
+OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1799,6 +1802,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_neg_vec:
 tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1));
 break;
+case INDEX_op_mul_vec:
+tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1825,6 +1831,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_nor_vec:
 case INDEX_op_not_vec:
 case INDEX_op_neg_vec:
+case INDEX_op_mul_vec:
 return 1;
 default:
 return 0;
@@ -1999,6 +2006,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_orc_vec:
 case INDEX_op_xor_vec:
 case INDEX_op_nor_vec:
+case INDEX_op_mul_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 64c72d0857..2c2266ed31 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -185,7 +185,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_nand_vec 0
 #define TCG_TARGET_HAS_nor_vec  1
 #define TCG_TARGET_HAS_eqv_vec  0
-#define TCG_TARGET_HAS_mul_vec  0
+#define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  0
-- 
2.42.0

[PATCH v4 06/16] tcg/loongarch64: Lower vector bitwise operations

2023-09-07 Thread Jiajie Chen

Lower the following ops:

- and_vec
- andc_vec
- or_vec
- orc_vec
- xor_vec
- nor_vec
- not_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target-con-set.h |  2 ++
 tcg/loongarch64/tcg-target.c.inc | 44 
 tcg/loongarch64/tcg-target.h |  8 ++---
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 2d5dce75c3..3f530ad4d8 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -20,6 +20,7 @@ C_O0_I2(rZ, rZ)
 C_O0_I2(w, r)
 C_O1_I1(r, r)
 C_O1_I1(w, r)
+C_O1_I1(w, w)
 C_O1_I2(r, r, rC)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
@@ -31,6 +32,7 @@ C_O1_I2(r, 0, rZ)
 C_O1_I2(r, rZ, ri)
 C_O1_I2(r, rZ, rJ)
 C_O1_I2(r, rZ, rZ)
+C_O1_I2(w, w, w)
 C_O1_I2(w, w, wM)
 C_O1_I2(w, w, wA)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 1a369b237c..d569e443dd 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1722,6 +1722,32 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out_opc_vldx(s, a0, a1, temp);
 }
 break;
+case INDEX_op_and_vec:
+tcg_out_opc_vand_v(s, a0, a1, a2);
+break;
+case INDEX_op_andc_vec:
+/*
+ * vandn vd, vj, vk: vd = vk & ~vj
+ * andc_vec vd, vj, vk: vd = vj & ~vk
+ * vk and vk are swapped
+ */
+tcg_out_opc_vandn_v(s, a0, a2, a1);
+break;
+case INDEX_op_or_vec:
+tcg_out_opc_vor_v(s, a0, a1, a2);
+break;
+case INDEX_op_orc_vec:
+tcg_out_opc_vorn_v(s, a0, a1, a2);
+break;
+case INDEX_op_xor_vec:
+tcg_out_opc_vxor_v(s, a0, a1, a2);
+break;
+case INDEX_op_nor_vec:
+tcg_out_opc_vnor_v(s, a0, a1, a2);
+break;
+case INDEX_op_not_vec:
+tcg_out_opc_vnor_v(s, a0, a1, a1);
+break;
 case INDEX_op_cmp_vec:
 TCGCond cond = args[3];
 if (const_args[2]) {
@@ -1785,6 +1811,13 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_cmp_vec:
 case INDEX_op_add_vec:
 case INDEX_op_sub_vec:
+case INDEX_op_and_vec:
+case INDEX_op_andc_vec:
+case INDEX_op_or_vec:
+case INDEX_op_orc_vec:
+case INDEX_op_xor_vec:
+case INDEX_op_nor_vec:
+case INDEX_op_not_vec:
 return 1;
 default:
 return 0;
@@ -1953,6 +1986,17 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_sub_vec:
 return C_O1_I2(w, w, wA);
 
+case INDEX_op_and_vec:
+case INDEX_op_andc_vec:
+case INDEX_op_or_vec:
+case INDEX_op_orc_vec:
+case INDEX_op_xor_vec:
+case INDEX_op_nor_vec:
+return C_O1_I2(w, w, w);
+
+case INDEX_op_not_vec:
+return C_O1_I1(w, w);
+
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index daaf38ee31..f9c5cb12ca 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -177,13 +177,13 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_v128 use_lsx_instructions
 #define TCG_TARGET_HAS_v256 0
 
-#define TCG_TARGET_HAS_not_vec  0
+#define TCG_TARGET_HAS_not_vec  1
 #define TCG_TARGET_HAS_neg_vec  0
 #define TCG_TARGET_HAS_abs_vec  0
-#define TCG_TARGET_HAS_andc_vec 0
-#define TCG_TARGET_HAS_orc_vec  0
+#define TCG_TARGET_HAS_andc_vec 1
+#define TCG_TARGET_HAS_orc_vec  1
 #define TCG_TARGET_HAS_nand_vec 0
-#define TCG_TARGET_HAS_nor_vec  0
+#define TCG_TARGET_HAS_nor_vec  1
 #define TCG_TARGET_HAS_eqv_vec  0
 #define TCG_TARGET_HAS_mul_vec  0
 #define TCG_TARGET_HAS_shi_vec  0
-- 
2.42.0

[PATCH v4 11/16] tcg/loongarch64: Lower vector shift vector ops

2023-09-07 Thread Jiajie Chen

Lower the following ops:

- shlv_vec
- shrv_vec
- sarv_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 24 
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 90c52c38cf..6958fd219c 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1725,6 +1725,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn ussub_vec_insn[4] = {
 OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU
 };
+static const LoongArchInsn shlv_vec_insn[4] = {
+OPC_VSLL_B, OPC_VSLL_H, OPC_VSLL_W, OPC_VSLL_D
+};
+static const LoongArchInsn shrv_vec_insn[4] = {
+OPC_VSRL_B, OPC_VSRL_H, OPC_VSRL_W, OPC_VSRL_D
+};
+static const LoongArchInsn sarv_vec_insn[4] = {
+OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1853,6 +1862,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ussub_vec:
 tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_shlv_vec:
+tcg_out32(s, encode_vdvjvk_insn(shlv_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_shrv_vec:
+tcg_out32(s, encode_vdvjvk_insn(shrv_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sarv_vec:
+tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1888,6 +1906,9 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_usadd_vec:
 case INDEX_op_sssub_vec:
 case INDEX_op_ussub_vec:
+case INDEX_op_shlv_vec:
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
 return 1;
 default:
 return 0;
@@ -2071,6 +2092,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_usadd_vec:
 case INDEX_op_sssub_vec:
 case INDEX_op_ussub_vec:
+case INDEX_op_shlv_vec:
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index fa14558275..7e9fb61c47 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -188,7 +188,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
-#define TCG_TARGET_HAS_shv_vec  0
+#define TCG_TARGET_HAS_shv_vec  1
 #define TCG_TARGET_HAS_roti_vec 0
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
-- 
2.42.0

[PATCH v4 00/16] Lower TCG vector ops to LSX

2023-09-07 Thread Jiajie Chen

This patch series allows qemu to utilize LSX instructions on LoongArch
machines to execute TCG vector ops.

Passed tcg tests with x86_64 and aarch64 cross compilers.

Changes since v3:

- Refactor add/sub_vec handling code to use a helper function
- Only use vldx/vstx for MO_128 load/store, otherwise fallback to two ld/st

Changes since v2:

- Add vece argument to tcg_target_const_match() for const args of vector ops
- Use custom constraint for cmp_vec/add_vec/sub_vec for better const arg 
handling
- Implement 128-bit load & store using vldx/vstx

Changes since v1:

- Optimize dupi_vec/st_vec/ld_vec/cmp_vec/add_vec/sub_vec generation
- Lower not_vec/shi_vec/roti_vec/rotv_vec


Jiajie Chen (16):
  tcg/loongarch64: Import LSX instructions
  tcg/loongarch64: Lower basic tcg vec ops to LSX
  tcg: pass vece to tcg_target_const_match()
  tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt
  tcg/loongarch64: Lower add/sub_vec to vadd/vsub
  tcg/loongarch64: Lower vector bitwise operations
  tcg/loongarch64: Lower neg_vec to vneg
  tcg/loongarch64: Lower mul_vec to vmul
  tcg/loongarch64: Lower vector min max ops
  tcg/loongarch64: Lower vector saturated ops
  tcg/loongarch64: Lower vector shift vector ops
  tcg/loongarch64: Lower bitsel_vec to vbitsel
  tcg/loongarch64: Lower vector shift integer ops
  tcg/loongarch64: Lower rotv_vec ops to LSX
  tcg/loongarch64: Lower rotli_vec to vrotri
  tcg/loongarch64: Implement 128-bit load & store

 tcg/aarch64/tcg-target.c.inc |2 +-
 tcg/arm/tcg-target.c.inc |2 +-
 tcg/i386/tcg-target.c.inc|2 +-
 tcg/loongarch64/tcg-insn-defs.c.inc  | 6251 +-
 tcg/loongarch64/tcg-target-con-set.h |9 +
 tcg/loongarch64/tcg-target-con-str.h |3 +
 tcg/loongarch64/tcg-target.c.inc |  619 ++-
 tcg/loongarch64/tcg-target.h |   40 +-
 tcg/loongarch64/tcg-target.opc.h |   12 +
 tcg/mips/tcg-target.c.inc|2 +-
 tcg/ppc/tcg-target.c.inc |2 +-
 tcg/riscv/tcg-target.c.inc   |2 +-
 tcg/s390x/tcg-target.c.inc   |2 +-
 tcg/sparc64/tcg-target.c.inc |2 +-
 tcg/tcg.c|4 +-
 tcg/tci/tcg-target.c.inc |2 +-
 16 files changed, 6824 insertions(+), 132 deletions(-)
 create mode 100644 tcg/loongarch64/tcg-target.opc.h

-- 
2.42.0

[PATCH v4 14/16] tcg/loongarch64: Lower rotv_vec ops to LSX

2023-09-07 Thread Jiajie Chen

Lower the following ops:

- rotrv_vec
- rotlv_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 14 ++
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index c21c917083..8f448823b0 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1743,6 +1743,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn sari_vec_insn[4] = {
 OPC_VSRAI_B, OPC_VSRAI_H, OPC_VSRAI_W, OPC_VSRAI_D
 };
+static const LoongArchInsn rotrv_vec_insn[4] = {
+OPC_VROTR_B, OPC_VROTR_H, OPC_VROTR_W, OPC_VROTR_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1890,6 +1893,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sari_vec:
 tcg_out32(s, encode_vdvjuk3_insn(sari_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_rotrv_vec:
+tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_rotlv_vec:
+/* rotlv_vec a1, a2 = rotrv_vec a1, -a2 */
+tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], temp_vec, a2));
+tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1,
+temp_vec));
+break;
 case INDEX_op_bitsel_vec:
 /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
 tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
@@ -2119,6 +2131,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_shlv_vec:
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
+case INDEX_op_rotrv_vec:
+case INDEX_op_rotlv_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index d7b806e252..d5c69bc192 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -191,7 +191,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_shv_vec  1
 #define TCG_TARGET_HAS_roti_vec 0
 #define TCG_TARGET_HAS_rots_vec 0
-#define TCG_TARGET_HAS_rotv_vec 0
+#define TCG_TARGET_HAS_rotv_vec 1
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   1
-- 
2.42.0

[PATCH v4 16/16] tcg/loongarch64: Implement 128-bit load & store

2023-09-07 Thread Jiajie Chen

If LSX is available, use LSX instructions to implement 128-bit load &
store when MO_128 is required, otherwise use two 64-bit loads & stores.

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target-con-set.h |  2 +
 tcg/loongarch64/tcg-target.c.inc | 59 
 tcg/loongarch64/tcg-target.h |  2 +-
 3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 914572d21b..77d62e38e7 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -18,6 +18,7 @@ C_O0_I1(r)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
 C_O0_I2(w, r)
+C_O0_I3(r, r, r)
 C_O1_I1(r, r)
 C_O1_I1(w, r)
 C_O1_I1(w, w)
@@ -37,3 +38,4 @@ C_O1_I2(w, w, wM)
 C_O1_I2(w, w, wA)
 C_O1_I3(w, w, w, w)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
+C_O2_I1(r, r, r)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 82901d678a..6e9f334fed 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1081,6 +1081,48 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg 
data_reg, TCGReg addr_reg,
 }
 }
 
+static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg 
data_hi,
+   TCGReg addr_reg, MemOpIdx oi, bool is_ld)
+{
+TCGLabelQemuLdst *ldst;
+HostAddress h;
+
+ldst = prepare_host_addr(s, , addr_reg, oi, true);
+
+if (h.aa.atom == MO_128) {
+/*
+ * Use VLDX/VSTX when 128-bit atomicity is required.
+ * If address is aligned to 16-bytes, the 128-bit load/store is atomic.
+ */
+if (is_ld) {
+tcg_out_opc_vldx(s, TCG_VEC_TMP0, h.base, h.index);
+tcg_out_opc_vpickve2gr_d(s, data_lo, TCG_VEC_TMP0, 0);
+tcg_out_opc_vpickve2gr_d(s, data_hi, TCG_VEC_TMP0, 1);
+} else {
+tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_lo, 0);
+tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_hi, 1);
+tcg_out_opc_vstx(s, TCG_VEC_TMP0, h.base, h.index);
+}
+} else {
+/* otherwise use a pair of LD/ST */
+tcg_out_opc_add_d(s, TCG_REG_TMP0, h.base, h.index);
+if (is_ld) {
+tcg_out_opc_ld_d(s, data_lo, TCG_REG_TMP0, 0);
+tcg_out_opc_ld_d(s, data_hi, TCG_REG_TMP0, 8);
+} else {
+tcg_out_opc_st_d(s, data_lo, TCG_REG_TMP0, 0);
+tcg_out_opc_st_d(s, data_hi, TCG_REG_TMP0, 8);
+}
+}
+
+if (ldst) {
+ldst->type = TCG_TYPE_I128;
+ldst->datalo_reg = data_lo;
+ldst->datahi_reg = data_hi;
+ldst->raddr = tcg_splitwx_to_rx(s->code_ptr);
+}
+}
+
 /*
  * Entry-points
  */
@@ -1145,6 +1187,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 TCGArg a0 = args[0];
 TCGArg a1 = args[1];
 TCGArg a2 = args[2];
+TCGArg a3 = args[3];
 int c2 = const_args[2];
 
 switch (opc) {
@@ -1507,6 +1550,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_qemu_ld_a64_i64:
 tcg_out_qemu_ld(s, a0, a1, a2, TCG_TYPE_I64);
 break;
+case INDEX_op_qemu_ld_a32_i128:
+case INDEX_op_qemu_ld_a64_i128:
+tcg_out_qemu_ldst_i128(s, a0, a1, a2, a3, true);
+break;
 case INDEX_op_qemu_st_a32_i32:
 case INDEX_op_qemu_st_a64_i32:
 tcg_out_qemu_st(s, a0, a1, a2, TCG_TYPE_I32);
@@ -1515,6 +1562,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_qemu_st_a64_i64:
 tcg_out_qemu_st(s, a0, a1, a2, TCG_TYPE_I64);
 break;
+case INDEX_op_qemu_st_a32_i128:
+case INDEX_op_qemu_st_a64_i128:
+tcg_out_qemu_ldst_i128(s, a0, a1, a2, a3, false);
+break;
 
 case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
 case INDEX_op_mov_i64:
@@ -1996,6 +2047,14 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_qemu_st_a64_i64:
 return C_O0_I2(rZ, r);
 
+case INDEX_op_qemu_ld_a32_i128:
+case INDEX_op_qemu_ld_a64_i128:
+return C_O2_I1(r, r, r);
+
+case INDEX_op_qemu_st_a32_i128:
+case INDEX_op_qemu_st_a64_i128:
+return C_O0_I3(r, r, r);
+
 case INDEX_op_brcond_i32:
 case INDEX_op_brcond_i64:
 return C_O0_I2(rZ, rZ);
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 67b0a95532..03017672f6 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -171,7 +171,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_muluh_i641
 #define TCG_TARGET_HAS_mulsh_i641
 
-#define TCG_TARGET_HAS_qemu_ldst_i128   0
+#define TCG_TARGET_HAS_qemu_ldst_i128   use_lsx_instructions
 
 #define TCG_TARGET_HAS_v64  0
 #define TCG_TARGET_HAS_v128 use_lsx_instructions
-- 
2.42.0

[PATCH v4 09/16] tcg/loongarch64: Lower vector min max ops

2023-09-07 Thread Jiajie Chen

Lower the following ops:

- smin_vec
- smax_vec
- umin_vec
- umax_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 32 
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 0814f62905..bdf22d8807 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1701,6 +1701,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn mul_vec_insn[4] = {
 OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D
 };
+static const LoongArchInsn smin_vec_insn[4] = {
+OPC_VMIN_B, OPC_VMIN_H, OPC_VMIN_W, OPC_VMIN_D
+};
+static const LoongArchInsn umin_vec_insn[4] = {
+OPC_VMIN_BU, OPC_VMIN_HU, OPC_VMIN_WU, OPC_VMIN_DU
+};
+static const LoongArchInsn smax_vec_insn[4] = {
+OPC_VMAX_B, OPC_VMAX_H, OPC_VMAX_W, OPC_VMAX_D
+};
+static const LoongArchInsn umax_vec_insn[4] = {
+OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1805,6 +1817,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_mul_vec:
 tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_smin_vec:
+tcg_out32(s, encode_vdvjvk_insn(smin_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_smax_vec:
+tcg_out32(s, encode_vdvjvk_insn(smax_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_umin_vec:
+tcg_out32(s, encode_vdvjvk_insn(umin_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_umax_vec:
+tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1832,6 +1856,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_not_vec:
 case INDEX_op_neg_vec:
 case INDEX_op_mul_vec:
+case INDEX_op_smin_vec:
+case INDEX_op_smax_vec:
+case INDEX_op_umin_vec:
+case INDEX_op_umax_vec:
 return 1;
 default:
 return 0;
@@ -2007,6 +2035,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_xor_vec:
 case INDEX_op_nor_vec:
 case INDEX_op_mul_vec:
+case INDEX_op_smin_vec:
+case INDEX_op_smax_vec:
+case INDEX_op_umin_vec:
+case INDEX_op_umax_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 2c2266ed31..ec725aaeaa 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -193,7 +193,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
 #define TCG_TARGET_HAS_sat_vec  0
-#define TCG_TARGET_HAS_minmax_vec   0
+#define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   0
 #define TCG_TARGET_HAS_cmpsel_vec   0
 
-- 
2.42.0

[PATCH v4 05/16] tcg/loongarch64: Lower add/sub_vec to vadd/vsub

2023-09-07 Thread Jiajie Chen

Lower the following ops:

- add_vec
- sub_vec

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target-con-set.h |  1 +
 tcg/loongarch64/tcg-target-con-str.h |  1 +
 tcg/loongarch64/tcg-target.c.inc | 61 
 3 files changed, 63 insertions(+)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 8c8ea5d919..2d5dce75c3 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -32,4 +32,5 @@ C_O1_I2(r, rZ, ri)
 C_O1_I2(r, rZ, rJ)
 C_O1_I2(r, rZ, rZ)
 C_O1_I2(w, w, wM)
+C_O1_I2(w, w, wA)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target-con-str.h 
b/tcg/loongarch64/tcg-target-con-str.h
index a8a1c44014..2ba9c135ac 100644
--- a/tcg/loongarch64/tcg-target-con-str.h
+++ b/tcg/loongarch64/tcg-target-con-str.h
@@ -27,3 +27,4 @@ CONST('Z', TCG_CT_CONST_ZERO)
 CONST('C', TCG_CT_CONST_C12)
 CONST('W', TCG_CT_CONST_WSZ)
 CONST('M', TCG_CT_CONST_VCMP)
+CONST('A', TCG_CT_CONST_VADD)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 129dd92910..1a369b237c 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -177,6 +177,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define TCG_CT_CONST_C12   0x1000
 #define TCG_CT_CONST_WSZ   0x2000
 #define TCG_CT_CONST_VCMP  0x4000
+#define TCG_CT_CONST_VADD  0x8000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 #define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32)
@@ -214,6 +215,9 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct, int vece)
 if ((ct & TCG_CT_CONST_VCMP) && -0x10 <= vec_val && vec_val <= 0x1f) {
 return true;
 }
+if ((ct & TCG_CT_CONST_VADD) && -0x1f <= vec_val && vec_val <= 0x1f) {
+return true;
+}
 return false;
 }
 
@@ -1621,6 +1625,51 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType 
type, unsigned vece,
 }
 }
 
+static void tcg_out_addsub_vec(TCGContext *s, unsigned vece, const TCGArg a0,
+   const TCGArg a1, const TCGArg a2,
+   bool a2_is_const, bool is_add)
+{
+static const LoongArchInsn add_vec_insn[4] = {
+OPC_VADD_B, OPC_VADD_H, OPC_VADD_W, OPC_VADD_D
+};
+static const LoongArchInsn add_vec_imm_insn[4] = {
+OPC_VADDI_BU, OPC_VADDI_HU, OPC_VADDI_WU, OPC_VADDI_DU
+};
+static const LoongArchInsn sub_vec_insn[4] = {
+OPC_VSUB_B, OPC_VSUB_H, OPC_VSUB_W, OPC_VSUB_D
+};
+static const LoongArchInsn sub_vec_imm_insn[4] = {
+OPC_VSUBI_BU, OPC_VSUBI_HU, OPC_VSUBI_WU, OPC_VSUBI_DU
+};
+
+if (a2_is_const) {
+int64_t value = sextract64(a2, 0, 8 << vece);
+if (!is_add) {
+value = -value;
+}
+
+/* Try vaddi/vsubi */
+if (0 <= value && value <= 0x1f) {
+tcg_out32(s, encode_vdvjuk5_insn(add_vec_imm_insn[vece], a0, \
+ a1, value));
+return;
+} else if (-0x1f <= value && value < 0) {
+tcg_out32(s, encode_vdvjuk5_insn(sub_vec_imm_insn[vece], a0, \
+ a1, -value));
+return;
+}
+
+/* constraint TCG_CT_CONST_VADD ensures unreachable */
+g_assert_not_reached();
+}
+
+if (is_add) {
+tcg_out32(s, encode_vdvjvk_insn(add_vec_insn[vece], a0, a1, a2));
+} else {
+tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2));
+}
+}
+
 static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
unsigned vecl, unsigned vece,
const TCGArg args[TCG_MAX_OP_ARGS],
@@ -1712,6 +1761,12 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 }
 tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2));
 break;
+case INDEX_op_add_vec:
+tcg_out_addsub_vec(s, vece, a0, a1, a2, const_args[2], true);
+break;
+case INDEX_op_sub_vec:
+tcg_out_addsub_vec(s, vece, a0, a1, a2, const_args[2], false);
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1728,6 +1783,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_dup_vec:
 case INDEX_op_dupm_vec:
 case INDEX_op_cmp_vec:
+case INDEX_op_add_vec:
+case INDEX_op_sub_vec:
 return 1;
 default:
 return 0;
@@ -1892,6 +1949,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_cmp_vec:
 return C_O1_I2(w, w, wM);
 
+case INDEX_op_add_vec:
+case INDEX_op_sub_vec:
+return C_O1_I2(w, w, wA);
+
 default:
 g_assert_not_reached();
 }
-- 
2.42.0

[PATCH v4 07/16] tcg/loongarch64: Lower neg_vec to vneg

2023-09-07 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 8 
 tcg/loongarch64/tcg-target.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index d569e443dd..b36b706e39 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1695,6 +1695,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 [TCG_COND_LTU] = {OPC_VSLTI_BU, OPC_VSLTI_HU, OPC_VSLTI_WU, 
OPC_VSLTI_DU},
 };
 LoongArchInsn insn;
+static const LoongArchInsn neg_vec_insn[4] = {
+OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1793,6 +1796,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sub_vec:
 tcg_out_addsub_vec(s, vece, a0, a1, a2, const_args[2], false);
 break;
+case INDEX_op_neg_vec:
+tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1818,6 +1824,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_xor_vec:
 case INDEX_op_nor_vec:
 case INDEX_op_not_vec:
+case INDEX_op_neg_vec:
 return 1;
 default:
 return 0;
@@ -1995,6 +2002,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
+case INDEX_op_neg_vec:
 return C_O1_I1(w, w);
 
 default:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index f9c5cb12ca..64c72d0857 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -178,7 +178,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_v256 0
 
 #define TCG_TARGET_HAS_not_vec  1
-#define TCG_TARGET_HAS_neg_vec  0
+#define TCG_TARGET_HAS_neg_vec  1
 #define TCG_TARGET_HAS_abs_vec  0
 #define TCG_TARGET_HAS_andc_vec 1
 #define TCG_TARGET_HAS_orc_vec  1
-- 
2.42.0

Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store

2023-09-02 Thread Jiajie Chen




On 2023/9/3 09:06, Richard Henderson wrote:

On 9/1/23 22:02, Jiajie Chen wrote:

If LSX is available, use LSX instructions to implement 128-bit load &
store.


Is this really guaranteed to be an atomic 128-bit operation?



Song Gao, please check this.


Or, as for many vector processors, is this really two separate 64-bit 
memory operations under the hood?



+static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, 
TCGReg data_hi,
+   TCGReg addr_reg, MemOpIdx oi, 
bool is_ld)

+{
+    TCGLabelQemuLdst *ldst;
+    HostAddress h;
+
+    ldst = prepare_host_addr(s, , addr_reg, oi, true);
+    if (is_ld) {
+    tcg_out_opc_vldx(s, TCG_VEC_TMP0, h.base, h.index);
+    tcg_out_opc_vpickve2gr_d(s, data_lo, TCG_VEC_TMP0, 0);
+    tcg_out_opc_vpickve2gr_d(s, data_hi, TCG_VEC_TMP0, 1);
+    } else {
+    tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_lo, 0);
+    tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_hi, 1);
+    tcg_out_opc_vstx(s, TCG_VEC_TMP0, h.base, h.index);
+    }


You should use h.aa.atom < MO_128 to determine if 128-bit atomicity, 
and therefore the vector operation, is required.  I assume the gr<->vr 
moves have a cost and two integer operations are preferred when 
allowable.


Compare the other implementations of this function.


r~

[PATCH v3 08/16] tcg/loongarch64: Lower mul_vec to vmul

2023-09-01 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 8 
 tcg/loongarch64/tcg-target.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 1e196bb68f..6905775698 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1665,6 +1665,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn neg_vec_insn[4] = {
 OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D
 };
+static const LoongArchInsn mul_vec_insn[4] = {
+OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1798,6 +1801,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_neg_vec:
 tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1));
 break;
+case INDEX_op_mul_vec:
+tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1824,6 +1830,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_nor_vec:
 case INDEX_op_not_vec:
 case INDEX_op_neg_vec:
+case INDEX_op_mul_vec:
 return 1;
 default:
 return 0;
@@ -1998,6 +2005,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_orc_vec:
 case INDEX_op_xor_vec:
 case INDEX_op_nor_vec:
+case INDEX_op_mul_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 64c72d0857..2c2266ed31 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -185,7 +185,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_nand_vec 0
 #define TCG_TARGET_HAS_nor_vec  1
 #define TCG_TARGET_HAS_eqv_vec  0
-#define TCG_TARGET_HAS_mul_vec  0
+#define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  0
-- 
2.42.0

[PATCH v3 10/16] tcg/loongarch64: Lower vector saturated ops

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- ssadd_vec
- usadd_vec
- sssub_vec
- ussub_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 32 
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 3ffc1691cd..89db41002c 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1680,6 +1680,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn umax_vec_insn[4] = {
 OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU
 };
+static const LoongArchInsn ssadd_vec_insn[4] = {
+OPC_VSADD_B, OPC_VSADD_H, OPC_VSADD_W, OPC_VSADD_D
+};
+static const LoongArchInsn usadd_vec_insn[4] = {
+OPC_VSADD_BU, OPC_VSADD_HU, OPC_VSADD_WU, OPC_VSADD_DU
+};
+static const LoongArchInsn sssub_vec_insn[4] = {
+OPC_VSSUB_B, OPC_VSSUB_H, OPC_VSSUB_W, OPC_VSSUB_D
+};
+static const LoongArchInsn ussub_vec_insn[4] = {
+OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1828,6 +1840,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_umax_vec:
 tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_ssadd_vec:
+tcg_out32(s, encode_vdvjvk_insn(ssadd_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_usadd_vec:
+tcg_out32(s, encode_vdvjvk_insn(usadd_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sssub_vec:
+tcg_out32(s, encode_vdvjvk_insn(sssub_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_ussub_vec:
+tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1859,6 +1883,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_smax_vec:
 case INDEX_op_umin_vec:
 case INDEX_op_umax_vec:
+case INDEX_op_ssadd_vec:
+case INDEX_op_usadd_vec:
+case INDEX_op_sssub_vec:
+case INDEX_op_ussub_vec:
 return 1;
 default:
 return 0;
@@ -2038,6 +2066,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_smax_vec:
 case INDEX_op_umin_vec:
 case INDEX_op_umax_vec:
+case INDEX_op_ssadd_vec:
+case INDEX_op_usadd_vec:
+case INDEX_op_sssub_vec:
+case INDEX_op_ussub_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index ec725aaeaa..fa14558275 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -192,7 +192,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_roti_vec 0
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
-#define TCG_TARGET_HAS_sat_vec  0
+#define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   0
 #define TCG_TARGET_HAS_cmpsel_vec   0
-- 
2.42.0

[PATCH v3 05/16] tcg/loongarch64: Lower add/sub_vec to vadd/vsub

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- add_vec
- sub_vec

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target-con-set.h |  1 +
 tcg/loongarch64/tcg-target-con-str.h |  1 +
 tcg/loongarch64/tcg-target.c.inc | 60 
 3 files changed, 62 insertions(+)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 8c8ea5d919..2d5dce75c3 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -32,4 +32,5 @@ C_O1_I2(r, rZ, ri)
 C_O1_I2(r, rZ, rJ)
 C_O1_I2(r, rZ, rZ)
 C_O1_I2(w, w, wM)
+C_O1_I2(w, w, wA)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target-con-str.h 
b/tcg/loongarch64/tcg-target-con-str.h
index a8a1c44014..2ba9c135ac 100644
--- a/tcg/loongarch64/tcg-target-con-str.h
+++ b/tcg/loongarch64/tcg-target-con-str.h
@@ -27,3 +27,4 @@ CONST('Z', TCG_CT_CONST_ZERO)
 CONST('C', TCG_CT_CONST_C12)
 CONST('W', TCG_CT_CONST_WSZ)
 CONST('M', TCG_CT_CONST_VCMP)
+CONST('A', TCG_CT_CONST_VADD)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 129dd92910..0edcf5be35 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -177,6 +177,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define TCG_CT_CONST_C12   0x1000
 #define TCG_CT_CONST_WSZ   0x2000
 #define TCG_CT_CONST_VCMP  0x4000
+#define TCG_CT_CONST_VADD  0x8000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 #define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32)
@@ -214,6 +215,9 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct, int vece)
 if ((ct & TCG_CT_CONST_VCMP) && -0x10 <= vec_val && vec_val <= 0x1f) {
 return true;
 }
+if ((ct & TCG_CT_CONST_VADD) && -0x1f <= vec_val && vec_val <= 0x1f) {
+return true;
+}
 return false;
 }
 
@@ -1646,6 +1650,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 [TCG_COND_LTU] = {OPC_VSLTI_BU, OPC_VSLTI_HU, OPC_VSLTI_WU, 
OPC_VSLTI_DU},
 };
 LoongArchInsn insn;
+static const LoongArchInsn add_vec_insn[4] = {
+OPC_VADD_B, OPC_VADD_H, OPC_VADD_W, OPC_VADD_D
+};
+static const LoongArchInsn add_vec_imm_insn[4] = {
+OPC_VADDI_BU, OPC_VADDI_HU, OPC_VADDI_WU, OPC_VADDI_DU
+};
+static const LoongArchInsn sub_vec_insn[4] = {
+OPC_VSUB_B, OPC_VSUB_H, OPC_VSUB_W, OPC_VSUB_D
+};
+static const LoongArchInsn sub_vec_imm_insn[4] = {
+OPC_VSUBI_BU, OPC_VSUBI_HU, OPC_VSUBI_WU, OPC_VSUBI_DU
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1712,6 +1728,44 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 }
 tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2));
 break;
+case INDEX_op_add_vec:
+if (const_args[2]) {
+int64_t value = sextract64(a2, 0, 8 << vece);
+/* Try vaddi/vsubi */
+if (0 <= value && value <= 0x1f) {
+tcg_out32(s, encode_vdvjuk5_insn(add_vec_imm_insn[vece], a0, \
+ a1, value));
+break;
+} else if (-0x1f <= value && value < 0) {
+tcg_out32(s, encode_vdvjuk5_insn(sub_vec_imm_insn[vece], a0, \
+ a1, -value));
+break;
+}
+
+/* constraint TCG_CT_CONST_VADD ensures unreachable */
+g_assert_not_reached();
+}
+tcg_out32(s, encode_vdvjvk_insn(add_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sub_vec:
+if (const_args[2]) {
+int64_t value = sextract64(a2, 0, 8 << vece);
+/* Try vaddi/vsubi */
+if (0 <= value && value <= 0x1f) {
+tcg_out32(s, encode_vdvjuk5_insn(sub_vec_imm_insn[vece], a0, \
+ a1, value));
+break;
+} else if (-0x1f <= value && value < 0) {
+tcg_out32(s, encode_vdvjuk5_insn(add_vec_imm_insn[vece], a0, \
+ a1, -value));
+break;
+}
+
+/* constraint TCG_CT_CONST_VADD ensures unreachable */
+g_assert_not_reached();
+}
+tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1728,6 +1782,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_dup_vec:
 case INDEX_op_dupm_vec:
 case INDEX_op_cmp_vec:
+case INDEX_op_add_vec:
+case INDEX_op_sub_vec:
 return 1;
 default:
 return 0;
@@ -1892,6 +1948,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)

[PATCH v3 07/16] tcg/loongarch64: Lower neg_vec to vneg

2023-09-01 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 8 
 tcg/loongarch64/tcg-target.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 133b0f7113..1e196bb68f 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1662,6 +1662,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn sub_vec_imm_insn[4] = {
 OPC_VSUBI_BU, OPC_VSUBI_HU, OPC_VSUBI_WU, OPC_VSUBI_DU
 };
+static const LoongArchInsn neg_vec_insn[4] = {
+OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1792,6 +1795,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 }
 tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_neg_vec:
+tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1817,6 +1823,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_xor_vec:
 case INDEX_op_nor_vec:
 case INDEX_op_not_vec:
+case INDEX_op_neg_vec:
 return 1;
 default:
 return 0;
@@ -1994,6 +2001,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
+case INDEX_op_neg_vec:
 return C_O1_I1(w, w);
 
 default:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index f9c5cb12ca..64c72d0857 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -178,7 +178,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_v256 0
 
 #define TCG_TARGET_HAS_not_vec  1
-#define TCG_TARGET_HAS_neg_vec  0
+#define TCG_TARGET_HAS_neg_vec  1
 #define TCG_TARGET_HAS_abs_vec  0
 #define TCG_TARGET_HAS_andc_vec 1
 #define TCG_TARGET_HAS_orc_vec  1
-- 
2.42.0

[PATCH v3 14/16] tcg/loongarch64: Lower rotv_vec ops to LSX

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- rotrv_vec
- rotlv_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 14 ++
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 8ac008b907..95359b1757 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1710,6 +1710,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn sari_vec_insn[4] = {
 OPC_VSRAI_B, OPC_VSRAI_H, OPC_VSRAI_W, OPC_VSRAI_D
 };
+static const LoongArchInsn rotrv_vec_insn[4] = {
+OPC_VROTR_B, OPC_VROTR_H, OPC_VROTR_W, OPC_VROTR_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1889,6 +1892,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sari_vec:
 tcg_out32(s, encode_vdvjuk3_insn(sari_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_rotrv_vec:
+tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_rotlv_vec:
+/* rotlv_vec a1, a2 = rotrv_vec a1, -a2 */
+tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], temp_vec, a2));
+tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1,
+temp_vec));
+break;
 case INDEX_op_bitsel_vec:
 /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
 tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
@@ -2118,6 +2130,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_shlv_vec:
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
+case INDEX_op_rotrv_vec:
+case INDEX_op_rotlv_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index d7b806e252..d5c69bc192 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -191,7 +191,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_shv_vec  1
 #define TCG_TARGET_HAS_roti_vec 0
 #define TCG_TARGET_HAS_rots_vec 0
-#define TCG_TARGET_HAS_rotv_vec 0
+#define TCG_TARGET_HAS_rotv_vec 1
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   1
-- 
2.42.0

[PATCH v3 02/16] tcg/loongarch64: Lower basic tcg vec ops to LSX

2023-09-01 Thread Jiajie Chen

LSX support on host cpu is detected via hwcap.

Lower the following ops to LSX:

- dup_vec
- dupi_vec
- dupm_vec
- ld_vec
- st_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target-con-set.h |   2 +
 tcg/loongarch64/tcg-target-con-str.h |   1 +
 tcg/loongarch64/tcg-target.c.inc | 219 ++-
 tcg/loongarch64/tcg-target.h |  38 -
 tcg/loongarch64/tcg-target.opc.h |  12 ++
 5 files changed, 270 insertions(+), 2 deletions(-)
 create mode 100644 tcg/loongarch64/tcg-target.opc.h

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index c2bde44613..37b3f80bf9 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -17,7 +17,9 @@
 C_O0_I1(r)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
+C_O0_I2(w, r)
 C_O1_I1(r, r)
+C_O1_I1(w, r)
 C_O1_I2(r, r, rC)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
diff --git a/tcg/loongarch64/tcg-target-con-str.h 
b/tcg/loongarch64/tcg-target-con-str.h
index 6e9ccca3ad..81b8d40278 100644
--- a/tcg/loongarch64/tcg-target-con-str.h
+++ b/tcg/loongarch64/tcg-target-con-str.h
@@ -14,6 +14,7 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
+REGS('w', ALL_VECTOR_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index baf5fc3819..150278e112 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -32,6 +32,8 @@
 #include "../tcg-ldst.c.inc"
 #include 
 
+bool use_lsx_instructions;
+
 #ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "zero",
@@ -65,7 +67,39 @@ static const char * const 
tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "s5",
 "s6",
 "s7",
-"s8"
+"s8",
+"vr0",
+"vr1",
+"vr2",
+"vr3",
+"vr4",
+"vr5",
+"vr6",
+"vr7",
+"vr8",
+"vr9",
+"vr10",
+"vr11",
+"vr12",
+"vr13",
+"vr14",
+"vr15",
+"vr16",
+"vr17",
+"vr18",
+"vr19",
+"vr20",
+"vr21",
+"vr22",
+"vr23",
+"vr24",
+"vr25",
+"vr26",
+"vr27",
+"vr28",
+"vr29",
+"vr30",
+"vr31",
 };
 #endif
 
@@ -102,6 +136,15 @@ static const int tcg_target_reg_alloc_order[] = {
 TCG_REG_A2,
 TCG_REG_A1,
 TCG_REG_A0,
+
+/* Vector registers */
+TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3,
+TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7,
+TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11,
+TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
+TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
+/* V24 - V31 are caller-saved, and skipped.  */
 };
 
 static const int tcg_target_call_iarg_regs[] = {
@@ -135,6 +178,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define TCG_CT_CONST_WSZ   0x2000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
+#define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32)
 
 static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len)
 {
@@ -1486,6 +1530,154 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 }
 }
 
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+TCGReg rd, TCGReg rs)
+{
+switch (vece) {
+case MO_8:
+tcg_out_opc_vreplgr2vr_b(s, rd, rs);
+break;
+case MO_16:
+tcg_out_opc_vreplgr2vr_h(s, rd, rs);
+break;
+case MO_32:
+tcg_out_opc_vreplgr2vr_w(s, rd, rs);
+break;
+case MO_64:
+tcg_out_opc_vreplgr2vr_d(s, rd, rs);
+break;
+default:
+g_assert_not_reached();
+}
+return true;
+}
+
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+ TCGReg r, TCGReg base, intptr_t offset)
+{
+/* Handle imm overflow and division (vldrepl.d imm is divided by 8) */
+if (offset < -0x800 || offset > 0x7ff || \
+(offset & ((1 << vece) - 1)) != 0) {
+tcg_out_addi(s, TCG_TYPE_I64, TCG_REG_TMP0, base, offset);
+base = TCG_REG_TMP0;
+offset = 0;
+}
+offset >>= vece;
+
+switch (vece) {
+case MO_8:
+tcg_out_opc_vldrepl_b(s, r, base, offset);
+break;
+case MO_16:
+tcg_out_opc_vldrepl_h(s, r, base, offset);
+break;
+case MO_32:
+tcg_out_opc_vldrepl_w(s, r, base, offset);
+break;
+case

[PATCH v3 06/16] tcg/loongarch64: Lower vector bitwise operations

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- and_vec
- andc_vec
- or_vec
- orc_vec
- xor_vec
- nor_vec
- not_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target-con-set.h |  2 ++
 tcg/loongarch64/tcg-target.c.inc | 44 
 tcg/loongarch64/tcg-target.h |  8 ++---
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 2d5dce75c3..3f530ad4d8 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -20,6 +20,7 @@ C_O0_I2(rZ, rZ)
 C_O0_I2(w, r)
 C_O1_I1(r, r)
 C_O1_I1(w, r)
+C_O1_I1(w, w)
 C_O1_I2(r, r, rC)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
@@ -31,6 +32,7 @@ C_O1_I2(r, 0, rZ)
 C_O1_I2(r, rZ, ri)
 C_O1_I2(r, rZ, rJ)
 C_O1_I2(r, rZ, rZ)
+C_O1_I2(w, w, w)
 C_O1_I2(w, w, wM)
 C_O1_I2(w, w, wA)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 0edcf5be35..133b0f7113 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1689,6 +1689,32 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out_opc_vldx(s, a0, a1, temp);
 }
 break;
+case INDEX_op_and_vec:
+tcg_out_opc_vand_v(s, a0, a1, a2);
+break;
+case INDEX_op_andc_vec:
+/*
+ * vandn vd, vj, vk: vd = vk & ~vj
+ * andc_vec vd, vj, vk: vd = vj & ~vk
+ * vk and vk are swapped
+ */
+tcg_out_opc_vandn_v(s, a0, a2, a1);
+break;
+case INDEX_op_or_vec:
+tcg_out_opc_vor_v(s, a0, a1, a2);
+break;
+case INDEX_op_orc_vec:
+tcg_out_opc_vorn_v(s, a0, a1, a2);
+break;
+case INDEX_op_xor_vec:
+tcg_out_opc_vxor_v(s, a0, a1, a2);
+break;
+case INDEX_op_nor_vec:
+tcg_out_opc_vnor_v(s, a0, a1, a2);
+break;
+case INDEX_op_not_vec:
+tcg_out_opc_vnor_v(s, a0, a1, a1);
+break;
 case INDEX_op_cmp_vec:
 TCGCond cond = args[3];
 if (const_args[2]) {
@@ -1784,6 +1810,13 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_cmp_vec:
 case INDEX_op_add_vec:
 case INDEX_op_sub_vec:
+case INDEX_op_and_vec:
+case INDEX_op_andc_vec:
+case INDEX_op_or_vec:
+case INDEX_op_orc_vec:
+case INDEX_op_xor_vec:
+case INDEX_op_nor_vec:
+case INDEX_op_not_vec:
 return 1;
 default:
 return 0;
@@ -1952,6 +1985,17 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_sub_vec:
 return C_O1_I2(w, w, wA);
 
+case INDEX_op_and_vec:
+case INDEX_op_andc_vec:
+case INDEX_op_or_vec:
+case INDEX_op_orc_vec:
+case INDEX_op_xor_vec:
+case INDEX_op_nor_vec:
+return C_O1_I2(w, w, w);
+
+case INDEX_op_not_vec:
+return C_O1_I1(w, w);
+
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index daaf38ee31..f9c5cb12ca 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -177,13 +177,13 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_v128 use_lsx_instructions
 #define TCG_TARGET_HAS_v256 0
 
-#define TCG_TARGET_HAS_not_vec  0
+#define TCG_TARGET_HAS_not_vec  1
 #define TCG_TARGET_HAS_neg_vec  0
 #define TCG_TARGET_HAS_abs_vec  0
-#define TCG_TARGET_HAS_andc_vec 0
-#define TCG_TARGET_HAS_orc_vec  0
+#define TCG_TARGET_HAS_andc_vec 1
+#define TCG_TARGET_HAS_orc_vec  1
 #define TCG_TARGET_HAS_nand_vec 0
-#define TCG_TARGET_HAS_nor_vec  0
+#define TCG_TARGET_HAS_nor_vec  1
 #define TCG_TARGET_HAS_eqv_vec  0
 #define TCG_TARGET_HAS_mul_vec  0
 #define TCG_TARGET_HAS_shi_vec  0
-- 
2.42.0

[PATCH v3 13/16] tcg/loongarch64: Lower vector shift integer ops

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- shli_vec
- shrv_vec
- sarv_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 21 +
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 2db4369a9e..8ac008b907 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1701,6 +1701,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn sarv_vec_insn[4] = {
 OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D
 };
+static const LoongArchInsn shli_vec_insn[4] = {
+OPC_VSLLI_B, OPC_VSLLI_H, OPC_VSLLI_W, OPC_VSLLI_D
+};
+static const LoongArchInsn shri_vec_insn[4] = {
+OPC_VSRLI_B, OPC_VSRLI_H, OPC_VSRLI_W, OPC_VSRLI_D
+};
+static const LoongArchInsn sari_vec_insn[4] = {
+OPC_VSRAI_B, OPC_VSRAI_H, OPC_VSRAI_W, OPC_VSRAI_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1871,6 +1880,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sarv_vec:
 tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_shli_vec:
+tcg_out32(s, encode_vdvjuk3_insn(shli_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_shri_vec:
+tcg_out32(s, encode_vdvjuk3_insn(shri_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sari_vec:
+tcg_out32(s, encode_vdvjuk3_insn(sari_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_bitsel_vec:
 /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
 tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
@@ -2104,6 +2122,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 
 case INDEX_op_not_vec:
 case INDEX_op_neg_vec:
+case INDEX_op_shli_vec:
+case INDEX_op_shri_vec:
+case INDEX_op_sari_vec:
 return C_O1_I1(w, w);
 
 case INDEX_op_bitsel_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index bc56939a57..d7b806e252 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -186,7 +186,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_nor_vec  1
 #define TCG_TARGET_HAS_eqv_vec  0
 #define TCG_TARGET_HAS_mul_vec  1
-#define TCG_TARGET_HAS_shi_vec  0
+#define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  1
 #define TCG_TARGET_HAS_roti_vec 0
-- 
2.42.0

[PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store

2023-09-01 Thread Jiajie Chen

If LSX is available, use LSX instructions to implement 128-bit load &
store.

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target-con-set.h |  2 ++
 tcg/loongarch64/tcg-target.c.inc | 42 
 tcg/loongarch64/tcg-target.h |  2 +-
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 914572d21b..77d62e38e7 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -18,6 +18,7 @@ C_O0_I1(r)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
 C_O0_I2(w, r)
+C_O0_I3(r, r, r)
 C_O1_I1(r, r)
 C_O1_I1(w, r)
 C_O1_I1(w, w)
@@ -37,3 +38,4 @@ C_O1_I2(w, w, wM)
 C_O1_I2(w, w, wA)
 C_O1_I3(w, w, w, w)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
+C_O2_I1(r, r, r)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 2b001598e2..9d999ef58c 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1081,6 +1081,31 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg 
data_reg, TCGReg addr_reg,
 }
 }
 
+static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg 
data_hi,
+   TCGReg addr_reg, MemOpIdx oi, bool is_ld)
+{
+TCGLabelQemuLdst *ldst;
+HostAddress h;
+
+ldst = prepare_host_addr(s, , addr_reg, oi, true);
+if (is_ld) {
+tcg_out_opc_vldx(s, TCG_VEC_TMP0, h.base, h.index);
+tcg_out_opc_vpickve2gr_d(s, data_lo, TCG_VEC_TMP0, 0);
+tcg_out_opc_vpickve2gr_d(s, data_hi, TCG_VEC_TMP0, 1);
+} else {
+tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_lo, 0);
+tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_hi, 1);
+tcg_out_opc_vstx(s, TCG_VEC_TMP0, h.base, h.index);
+}
+
+if (ldst) {
+ldst->type = TCG_TYPE_I128;
+ldst->datalo_reg = data_lo;
+ldst->datahi_reg = data_hi;
+ldst->raddr = tcg_splitwx_to_rx(s->code_ptr);
+}
+}
+
 /*
  * Entry-points
  */
@@ -1145,6 +1170,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 TCGArg a0 = args[0];
 TCGArg a1 = args[1];
 TCGArg a2 = args[2];
+TCGArg a3 = args[3];
 int c2 = const_args[2];
 
 switch (opc) {
@@ -1507,6 +1533,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_qemu_ld_a64_i64:
 tcg_out_qemu_ld(s, a0, a1, a2, TCG_TYPE_I64);
 break;
+case INDEX_op_qemu_ld_a32_i128:
+case INDEX_op_qemu_ld_a64_i128:
+tcg_out_qemu_ldst_i128(s, a0, a1, a2, a3, true);
+break;
 case INDEX_op_qemu_st_a32_i32:
 case INDEX_op_qemu_st_a64_i32:
 tcg_out_qemu_st(s, a0, a1, a2, TCG_TYPE_I32);
@@ -1515,6 +1545,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_qemu_st_a64_i64:
 tcg_out_qemu_st(s, a0, a1, a2, TCG_TYPE_I64);
 break;
+case INDEX_op_qemu_st_a32_i128:
+case INDEX_op_qemu_st_a64_i128:
+tcg_out_qemu_ldst_i128(s, a0, a1, a2, a3, false);
+break;
 
 case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
 case INDEX_op_mov_i64:
@@ -1995,6 +2029,14 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_qemu_st_a64_i64:
 return C_O0_I2(rZ, r);
 
+case INDEX_op_qemu_ld_a32_i128:
+case INDEX_op_qemu_ld_a64_i128:
+return C_O2_I1(r, r, r);
+
+case INDEX_op_qemu_st_a32_i128:
+case INDEX_op_qemu_st_a64_i128:
+return C_O0_I3(r, r, r);
+
 case INDEX_op_brcond_i32:
 case INDEX_op_brcond_i64:
 return C_O0_I2(rZ, rZ);
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 67b0a95532..03017672f6 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -171,7 +171,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_muluh_i641
 #define TCG_TARGET_HAS_mulsh_i641
 
-#define TCG_TARGET_HAS_qemu_ldst_i128   0
+#define TCG_TARGET_HAS_qemu_ldst_i128   use_lsx_instructions
 
 #define TCG_TARGET_HAS_v64  0
 #define TCG_TARGET_HAS_v128 use_lsx_instructions
-- 
2.42.0

[PATCH v3 12/16] tcg/loongarch64: Lower bitsel_vec to vbitsel

2023-09-01 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target-con-set.h |  1 +
 tcg/loongarch64/tcg-target.c.inc | 11 ++-
 tcg/loongarch64/tcg-target.h |  2 +-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 3f530ad4d8..914572d21b 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -35,4 +35,5 @@ C_O1_I2(r, rZ, rZ)
 C_O1_I2(w, w, w)
 C_O1_I2(w, w, wM)
 C_O1_I2(w, w, wA)
+C_O1_I3(w, w, w, w)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index ef1cd7c621..2db4369a9e 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1631,7 +1631,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
const int const_args[TCG_MAX_OP_ARGS])
 {
 TCGType type = vecl + TCG_TYPE_V64;
-TCGArg a0, a1, a2;
+TCGArg a0, a1, a2, a3;
 TCGReg temp = TCG_REG_TMP0;
 TCGReg temp_vec = TCG_VEC_TMP0;
 
@@ -1705,6 +1705,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 a0 = args[0];
 a1 = args[1];
 a2 = args[2];
+a3 = args[3];
 
 /* Currently only supports V128 */
 tcg_debug_assert(type == TCG_TYPE_V128);
@@ -1870,6 +1871,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sarv_vec:
 tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_bitsel_vec:
+/* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
+tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1908,6 +1913,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_shlv_vec:
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
+case INDEX_op_bitsel_vec:
 return 1;
 default:
 return 0;
@@ -2100,6 +2106,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_neg_vec:
 return C_O1_I1(w, w);
 
+case INDEX_op_bitsel_vec:
+return C_O1_I3(w, w, w, w);
+
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 7e9fb61c47..bc56939a57 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -194,7 +194,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_rotv_vec 0
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
-#define TCG_TARGET_HAS_bitsel_vec   0
+#define TCG_TARGET_HAS_bitsel_vec   1
 #define TCG_TARGET_HAS_cmpsel_vec   0
 
 #define TCG_TARGET_DEFAULT_MO (0)
-- 
2.42.0

[PATCH v3 11/16] tcg/loongarch64: Lower vector shift vector ops

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- shlv_vec
- shrv_vec
- sarv_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 24 
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 89db41002c..ef1cd7c621 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1692,6 +1692,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn ussub_vec_insn[4] = {
 OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU
 };
+static const LoongArchInsn shlv_vec_insn[4] = {
+OPC_VSLL_B, OPC_VSLL_H, OPC_VSLL_W, OPC_VSLL_D
+};
+static const LoongArchInsn shrv_vec_insn[4] = {
+OPC_VSRL_B, OPC_VSRL_H, OPC_VSRL_W, OPC_VSRL_D
+};
+static const LoongArchInsn sarv_vec_insn[4] = {
+OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1852,6 +1861,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ussub_vec:
 tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_shlv_vec:
+tcg_out32(s, encode_vdvjvk_insn(shlv_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_shrv_vec:
+tcg_out32(s, encode_vdvjvk_insn(shrv_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sarv_vec:
+tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1887,6 +1905,9 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_usadd_vec:
 case INDEX_op_sssub_vec:
 case INDEX_op_ussub_vec:
+case INDEX_op_shlv_vec:
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
 return 1;
 default:
 return 0;
@@ -2070,6 +2091,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_usadd_vec:
 case INDEX_op_sssub_vec:
 case INDEX_op_ussub_vec:
+case INDEX_op_shlv_vec:
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index fa14558275..7e9fb61c47 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -188,7 +188,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
-#define TCG_TARGET_HAS_shv_vec  0
+#define TCG_TARGET_HAS_shv_vec  1
 #define TCG_TARGET_HAS_roti_vec 0
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
-- 
2.42.0

[PATCH v3 04/16] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt

2023-09-01 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target-con-set.h |  1 +
 tcg/loongarch64/tcg-target-con-str.h |  1 +
 tcg/loongarch64/tcg-target.c.inc | 65 
 3 files changed, 67 insertions(+)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 37b3f80bf9..8c8ea5d919 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -31,4 +31,5 @@ C_O1_I2(r, 0, rZ)
 C_O1_I2(r, rZ, ri)
 C_O1_I2(r, rZ, rJ)
 C_O1_I2(r, rZ, rZ)
+C_O1_I2(w, w, wM)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target-con-str.h 
b/tcg/loongarch64/tcg-target-con-str.h
index 81b8d40278..a8a1c44014 100644
--- a/tcg/loongarch64/tcg-target-con-str.h
+++ b/tcg/loongarch64/tcg-target-con-str.h
@@ -26,3 +26,4 @@ CONST('U', TCG_CT_CONST_U12)
 CONST('Z', TCG_CT_CONST_ZERO)
 CONST('C', TCG_CT_CONST_C12)
 CONST('W', TCG_CT_CONST_WSZ)
+CONST('M', TCG_CT_CONST_VCMP)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 07a0326e5d..129dd92910 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -176,6 +176,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define TCG_CT_CONST_U12   0x800
 #define TCG_CT_CONST_C12   0x1000
 #define TCG_CT_CONST_WSZ   0x2000
+#define TCG_CT_CONST_VCMP  0x4000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 #define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32)
@@ -209,6 +210,10 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct, int vece)
 if ((ct & TCG_CT_CONST_WSZ) && val == (type == TCG_TYPE_I32 ? 32 : 64)) {
 return true;
 }
+int64_t vec_val = sextract64(val, 0, 8 << vece);
+if ((ct & TCG_CT_CONST_VCMP) && -0x10 <= vec_val && vec_val <= 0x1f) {
+return true;
+}
 return false;
 }
 
@@ -1624,6 +1629,23 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 TCGType type = vecl + TCG_TYPE_V64;
 TCGArg a0, a1, a2;
 TCGReg temp = TCG_REG_TMP0;
+TCGReg temp_vec = TCG_VEC_TMP0;
+
+static const LoongArchInsn cmp_vec_insn[16][4] = {
+[TCG_COND_EQ] = {OPC_VSEQ_B, OPC_VSEQ_H, OPC_VSEQ_W, OPC_VSEQ_D},
+[TCG_COND_LE] = {OPC_VSLE_B, OPC_VSLE_H, OPC_VSLE_W, OPC_VSLE_D},
+[TCG_COND_LEU] = {OPC_VSLE_BU, OPC_VSLE_HU, OPC_VSLE_WU, OPC_VSLE_DU},
+[TCG_COND_LT] = {OPC_VSLT_B, OPC_VSLT_H, OPC_VSLT_W, OPC_VSLT_D},
+[TCG_COND_LTU] = {OPC_VSLT_BU, OPC_VSLT_HU, OPC_VSLT_WU, OPC_VSLT_DU},
+};
+static const LoongArchInsn cmp_vec_imm_insn[16][4] = {
+[TCG_COND_EQ] = {OPC_VSEQI_B, OPC_VSEQI_H, OPC_VSEQI_W, OPC_VSEQI_D},
+[TCG_COND_LE] = {OPC_VSLEI_B, OPC_VSLEI_H, OPC_VSLEI_W, OPC_VSLEI_D},
+[TCG_COND_LEU] = {OPC_VSLEI_BU, OPC_VSLEI_HU, OPC_VSLEI_WU, 
OPC_VSLEI_DU},
+[TCG_COND_LT] = {OPC_VSLTI_B, OPC_VSLTI_H, OPC_VSLTI_W, OPC_VSLTI_D},
+[TCG_COND_LTU] = {OPC_VSLTI_BU, OPC_VSLTI_HU, OPC_VSLTI_WU, 
OPC_VSLTI_DU},
+};
+LoongArchInsn insn;
 
 a0 = args[0];
 a1 = args[1];
@@ -1651,6 +1673,45 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out_opc_vldx(s, a0, a1, temp);
 }
 break;
+case INDEX_op_cmp_vec:
+TCGCond cond = args[3];
+if (const_args[2]) {
+/*
+ * cmp_vec dest, src, value
+ * Try vseqi/vslei/vslti
+ */
+int64_t value = sextract64(a2, 0, 8 << vece);
+if ((cond == TCG_COND_EQ || cond == TCG_COND_LE || \
+ cond == TCG_COND_LT) && (-0x10 <= value && value <= 0x0f)) {
+tcg_out32(s, encode_vdvjsk5_insn(cmp_vec_imm_insn[cond][vece], 
\
+ a0, a1, value));
+break;
+} else if ((cond == TCG_COND_LEU || cond == TCG_COND_LTU) &&
+(0x00 <= value && value <= 0x1f)) {
+tcg_out32(s, encode_vdvjuk5_insn(cmp_vec_imm_insn[cond][vece], 
\
+ a0, a1, value));
+break;
+}
+
+/*
+ * Fallback to:
+ * dupi_vec temp, a2
+ * cmp_vec a0, a1, temp, cond
+ */
+tcg_out_dupi_vec(s, type, vece, temp_vec, a2);
+a2 = temp_vec;
+}
+
+insn = cmp_vec_insn[cond][vece];
+if (insn == 0) {
+TCGArg t;
+t = a1, a1 = a2, a2 = t;
+cond = tcg_swap_cond(cond);
+insn = cmp_vec_insn[cond][vece];
+tcg_debug_assert(insn != 0);
+}
+tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1666,6 +1727,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc,

[PATCH v3 09/16] tcg/loongarch64: Lower vector min max ops

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- smin_vec
- smax_vec
- umin_vec
- umax_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 32 
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 6905775698..3ffc1691cd 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1668,6 +1668,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn mul_vec_insn[4] = {
 OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D
 };
+static const LoongArchInsn smin_vec_insn[4] = {
+OPC_VMIN_B, OPC_VMIN_H, OPC_VMIN_W, OPC_VMIN_D
+};
+static const LoongArchInsn umin_vec_insn[4] = {
+OPC_VMIN_BU, OPC_VMIN_HU, OPC_VMIN_WU, OPC_VMIN_DU
+};
+static const LoongArchInsn smax_vec_insn[4] = {
+OPC_VMAX_B, OPC_VMAX_H, OPC_VMAX_W, OPC_VMAX_D
+};
+static const LoongArchInsn umax_vec_insn[4] = {
+OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1804,6 +1816,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_mul_vec:
 tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_smin_vec:
+tcg_out32(s, encode_vdvjvk_insn(smin_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_smax_vec:
+tcg_out32(s, encode_vdvjvk_insn(smax_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_umin_vec:
+tcg_out32(s, encode_vdvjvk_insn(umin_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_umax_vec:
+tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1831,6 +1855,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_not_vec:
 case INDEX_op_neg_vec:
 case INDEX_op_mul_vec:
+case INDEX_op_smin_vec:
+case INDEX_op_smax_vec:
+case INDEX_op_umin_vec:
+case INDEX_op_umax_vec:
 return 1;
 default:
 return 0;
@@ -2006,6 +2034,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_xor_vec:
 case INDEX_op_nor_vec:
 case INDEX_op_mul_vec:
+case INDEX_op_smin_vec:
+case INDEX_op_smax_vec:
+case INDEX_op_umin_vec:
+case INDEX_op_umax_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 2c2266ed31..ec725aaeaa 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -193,7 +193,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
 #define TCG_TARGET_HAS_sat_vec  0
-#define TCG_TARGET_HAS_minmax_vec   0
+#define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   0
 #define TCG_TARGET_HAS_cmpsel_vec   0
 
-- 
2.42.0

[PATCH v3 15/16] tcg/loongarch64: Lower rotli_vec to vrotri

2023-09-01 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 21 +
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 95359b1757..2b001598e2 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1901,6 +1901,26 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1,
 temp_vec));
 break;
+case INDEX_op_rotli_vec:
+/* rotli_vec a1, a2 = rotri_vec a1, -a2 */
+a2 = extract32(-a2, 0, 3 + vece);
+switch (vece) {
+case MO_8:
+tcg_out_opc_vrotri_b(s, a0, a1, a2);
+break;
+case MO_16:
+tcg_out_opc_vrotri_h(s, a0, a1, a2);
+break;
+case MO_32:
+tcg_out_opc_vrotri_w(s, a0, a1, a2);
+break;
+case MO_64:
+tcg_out_opc_vrotri_d(s, a0, a1, a2);
+break;
+default:
+g_assert_not_reached();
+}
+break;
 case INDEX_op_bitsel_vec:
 /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
 tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
@@ -2139,6 +2159,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_shli_vec:
 case INDEX_op_shri_vec:
 case INDEX_op_sari_vec:
+case INDEX_op_rotli_vec:
 return C_O1_I1(w, w);
 
 case INDEX_op_bitsel_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index d5c69bc192..67b0a95532 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -189,7 +189,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  1
-#define TCG_TARGET_HAS_roti_vec 0
+#define TCG_TARGET_HAS_roti_vec 1
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 1
 #define TCG_TARGET_HAS_sat_vec  1
-- 
2.42.0

[PATCH v3 00/16] Lower TCG vector ops to LSX

2023-09-01 Thread Jiajie Chen

This patch series allows qemu to utilize LSX instructions on LoongArch
machines to execute TCG vector ops.

Passed tcg tests with x86_64 and aarch64 cross compilers.

Changes since v2:

- Add vece argument to tcg_target_const_match() for const args of vector ops
- Use custom constraint for cmp_vec/add_vec/sub_vec for better const arg 
handling
- Implement 128-bit load & store using vldx/vstx

Changes since v1:

- Optimize dupi_vec/st_vec/ld_vec/cmp_vec/add_vec/sub_vec generation
- Lower not_vec/shi_vec/roti_vec/rotv_vec

Jiajie Chen (16):
  tcg/loongarch64: Import LSX instructions
  tcg/loongarch64: Lower basic tcg vec ops to LSX
  tcg: pass vece to tcg_target_const_match()
  tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt
  tcg/loongarch64: Lower add/sub_vec to vadd/vsub
  tcg/loongarch64: Lower vector bitwise operations
  tcg/loongarch64: Lower neg_vec to vneg
  tcg/loongarch64: Lower mul_vec to vmul
  tcg/loongarch64: Lower vector min max ops
  tcg/loongarch64: Lower vector saturated ops
  tcg/loongarch64: Lower vector shift vector ops
  tcg/loongarch64: Lower bitsel_vec to vbitsel
  tcg/loongarch64: Lower vector shift integer ops
  tcg/loongarch64: Lower rotv_vec ops to LSX
  tcg/loongarch64: Lower rotli_vec to vrotri
  tcg/loongarch64: Implement 128-bit load & store

 tcg/aarch64/tcg-target.c.inc |2 +-
 tcg/arm/tcg-target.c.inc |2 +-
 tcg/i386/tcg-target.c.inc|2 +-
 tcg/loongarch64/tcg-insn-defs.c.inc  | 6251 +-
 tcg/loongarch64/tcg-target-con-set.h |9 +
 tcg/loongarch64/tcg-target-con-str.h |3 +
 tcg/loongarch64/tcg-target.c.inc |  601 ++-
 tcg/loongarch64/tcg-target.h |   40 +-
 tcg/loongarch64/tcg-target.opc.h |   12 +
 tcg/mips/tcg-target.c.inc|2 +-
 tcg/ppc/tcg-target.c.inc |2 +-
 tcg/riscv/tcg-target.c.inc   |2 +-
 tcg/s390x/tcg-target.c.inc   |2 +-
 tcg/sparc64/tcg-target.c.inc |2 +-
 tcg/tcg.c|4 +-
 tcg/tci/tcg-target.c.inc |2 +-
 16 files changed, 6806 insertions(+), 132 deletions(-)
 create mode 100644 tcg/loongarch64/tcg-target.opc.h

-- 
2.42.0

[PATCH v3 03/16] tcg: pass vece to tcg_target_const_match()

2023-09-01 Thread Jiajie Chen

Pass vece to tcg_target_const_match() to allow correct interpretation of
const args of vector ops.

Signed-off-by: Jiajie Chen 
---
 tcg/aarch64/tcg-target.c.inc | 2 +-
 tcg/arm/tcg-target.c.inc | 2 +-
 tcg/i386/tcg-target.c.inc| 2 +-
 tcg/loongarch64/tcg-target.c.inc | 2 +-
 tcg/mips/tcg-target.c.inc| 2 +-
 tcg/ppc/tcg-target.c.inc | 2 +-
 tcg/riscv/tcg-target.c.inc   | 2 +-
 tcg/s390x/tcg-target.c.inc   | 2 +-
 tcg/sparc64/tcg-target.c.inc | 2 +-
 tcg/tcg.c| 4 ++--
 tcg/tci/tcg-target.c.inc | 2 +-
 11 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 0931a69448..a1e2b6be16 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -272,7 +272,7 @@ static bool is_shimm1632(uint32_t v32, int *cmode, int 
*imm8)
 }
 }
 
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index acb5f23b54..76f1345002 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -509,7 +509,7 @@ static bool is_shimm1632(uint32_t v32, int *cmode, int 
*imm8)
  * mov operand2: values represented with x << (2 * y), x < 0x100
  * add, sub, eor...: ditto
  */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 0c3d1e4cef..aed91e515e 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -198,7 +198,7 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 }
 
 /* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 150278e112..07a0326e5d 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -186,7 +186,7 @@ static inline tcg_target_long sextreg(tcg_target_long val, 
int pos, int len)
 }
 
 /* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return true;
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 9faa8bdf0b..c6662889f0 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -190,7 +190,7 @@ static bool is_p2m1(tcg_target_long val)
 }
 
 /* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 090f11e71c..ccf245191d 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -261,7 +261,7 @@ static bool reloc_pc14(tcg_insn_unit *src_rw, const 
tcg_insn_unit *target)
 }
 
 /* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 9be81c1b7b..3bd7959e7e 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -145,7 +145,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define sextreg  sextract64
 
 /* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index ecd8aaf2a1..f4d3abcb71 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -540,7 +540,7 @@ static bool risbg_mask(uint64_t c)
 }
 
 /* Test if a constant matches the constraint. */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
+static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc
index 81a08bb6c5..6b9be4c520 100644
--- a/tcg/sparc64/tcg-target.c.inc
+++ b/tcg/sparc64/tcg-target.c.inc
@@

Re: [PATCH v2 03/14] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt

2023-09-01 Thread Jiajie Chen




On 2023/9/2 01:48, Richard Henderson wrote:

On 9/1/23 10:28, Jiajie Chen wrote:


On 2023/9/2 01:24, Richard Henderson wrote:

On 9/1/23 02:30, Jiajie Chen wrote:

Signed-off-by: Jiajie Chen 
---
  tcg/loongarch64/tcg-target-con-set.h |  1 +
  tcg/loongarch64/tcg-target.c.inc | 60 


  2 files changed, 61 insertions(+)


Reviewed-by: Richard Henderson 




diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h

index 37b3f80bf9..d04916db25 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -31,4 +31,5 @@ C_O1_I2(r, 0, rZ)
  C_O1_I2(r, rZ, ri)
  C_O1_I2(r, rZ, rJ)
  C_O1_I2(r, rZ, rZ)
+C_O1_I2(w, w, wJ)


Notes for improvement: 'J' is a signed 32-bit immediate.



I was wondering about the behavior of 'J' on i128 types: in 
tcg_target_const_match(), the argument type is int, so will the 
higher bits be truncated?


The argument is int64_t val.

The only constants that we allow for vectors are dupi, so all higher 
parts are the same as the lower part.



Consider the following scenario:


cmp_vec v128,e32,tmp4,tmp3,v128$0x

cmp_vec v128,e32,tmp4,tmp3,v128$0xfffefffe

cmp_vec v128,e8,tmp4,tmp3,v128$0xfefefefefefefefe


When matching constant constraint, the vector element width is unknown, 
so it cannot decide whether 0xfefefefefefefefe means e8 0xfe or e16 0xfefe.





Besides, tcg_target_const_match() does not know the vector element 
width.


No, it hadn't been required so far -- there are very few vector 
instructions that allow immediates.



r~

Re: [PATCH v2 03/14] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt

2023-09-01 Thread Jiajie Chen




On 2023/9/2 01:24, Richard Henderson wrote:

On 9/1/23 02:30, Jiajie Chen wrote:

Signed-off-by: Jiajie Chen 
---
  tcg/loongarch64/tcg-target-con-set.h |  1 +
  tcg/loongarch64/tcg-target.c.inc | 60 
  2 files changed, 61 insertions(+)


Reviewed-by: Richard Henderson 




diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h

index 37b3f80bf9..d04916db25 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -31,4 +31,5 @@ C_O1_I2(r, 0, rZ)
  C_O1_I2(r, rZ, ri)
  C_O1_I2(r, rZ, rJ)
  C_O1_I2(r, rZ, rZ)
+C_O1_I2(w, w, wJ)


Notes for improvement: 'J' is a signed 32-bit immediate.



I was wondering about the behavior of 'J' on i128 types: in 
tcg_target_const_match(), the argument type is int, so will the higher 
bits be truncated?


Besides, tcg_target_const_match() does not know the vector element width.





+    if (const_args[2]) {
+    /*
+ * cmp_vec dest, src, value
+ * Try vseqi/vslei/vslti
+ */
+    int64_t value = sextract64(a2, 0, 8 << vece);
+    if ((cond == TCG_COND_EQ || cond == TCG_COND_LE || \
+ cond == TCG_COND_LT) && (-0x10 <= value && value <= 
0x0f)) {
+    tcg_out32(s, 
encode_vdvjsk5_insn(cmp_vec_imm_insn[cond][vece], \

+ a0, a1, value));
+    break;
+    } else if ((cond == TCG_COND_LEU || cond == 
TCG_COND_LTU) &&

+    (0x00 <= value && value <= 0x1f)) {
+    tcg_out32(s, 
encode_vdvjuk5_insn(cmp_vec_imm_insn[cond][vece], \

+ a0, a1, value));


Better would be a new constraint that only matches

    -0x10 <= x <= 0x1f

If the sign is wrong for the comparison, it can *always* be loaded 
with just vldi.


Whereas at present, using J,


+    tcg_out_dupi_vec(s, type, vece, temp_vec, a2);
+    a2 = temp_vec;


this may require 3 instructions (lu12i.w + ori + vreplgr2vr).

By constraining the constants allowed, you allow the register 
allocator to see that a register is required, which may be reused for 
another instruction.



r~

[PATCH v2 05/14] tcg/loongarch64: Lower vector bitwise operations

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- and_vec
- andc_vec
- or_vec
- orc_vec
- xor_vec
- nor_vec
- not_vec

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target-con-set.h |  2 ++
 tcg/loongarch64/tcg-target.c.inc | 44 
 tcg/loongarch64/tcg-target.h |  8 ++---
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index eaa015e813..13a7f3b5e2 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -20,6 +20,7 @@ C_O0_I2(rZ, rZ)
 C_O0_I2(w, r)
 C_O1_I1(r, r)
 C_O1_I1(w, r)
+C_O1_I1(w, w)
 C_O1_I2(r, r, rC)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
@@ -31,6 +32,7 @@ C_O1_I2(r, 0, rZ)
 C_O1_I2(r, rZ, ri)
 C_O1_I2(r, rZ, rJ)
 C_O1_I2(r, rZ, rZ)
+C_O1_I2(w, w, w)
 C_O1_I2(w, w, wi)
 C_O1_I2(w, w, wJ)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 555080f2b0..20e25dc490 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1680,6 +1680,32 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out_opc_vldx(s, a0, a1, temp);
 }
 break;
+case INDEX_op_and_vec:
+tcg_out_opc_vand_v(s, a0, a1, a2);
+break;
+case INDEX_op_andc_vec:
+/*
+ * vandn vd, vj, vk: vd = vk & ~vj
+ * andc_vec vd, vj, vk: vd = vj & ~vk
+ * vk and vk are swapped
+ */
+tcg_out_opc_vandn_v(s, a0, a2, a1);
+break;
+case INDEX_op_or_vec:
+tcg_out_opc_vor_v(s, a0, a1, a2);
+break;
+case INDEX_op_orc_vec:
+tcg_out_opc_vorn_v(s, a0, a1, a2);
+break;
+case INDEX_op_xor_vec:
+tcg_out_opc_vxor_v(s, a0, a1, a2);
+break;
+case INDEX_op_nor_vec:
+tcg_out_opc_vnor_v(s, a0, a1, a2);
+break;
+case INDEX_op_not_vec:
+tcg_out_opc_vnor_v(s, a0, a1, a1);
+break;
 case INDEX_op_cmp_vec:
 TCGCond cond = args[3];
 if (const_args[2]) {
@@ -1777,6 +1803,13 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_cmp_vec:
 case INDEX_op_add_vec:
 case INDEX_op_sub_vec:
+case INDEX_op_and_vec:
+case INDEX_op_andc_vec:
+case INDEX_op_or_vec:
+case INDEX_op_orc_vec:
+case INDEX_op_xor_vec:
+case INDEX_op_nor_vec:
+case INDEX_op_not_vec:
 return 1;
 default:
 return 0;
@@ -1945,6 +1978,17 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_sub_vec:
 return C_O1_I2(w, w, wi);
 
+case INDEX_op_and_vec:
+case INDEX_op_andc_vec:
+case INDEX_op_or_vec:
+case INDEX_op_orc_vec:
+case INDEX_op_xor_vec:
+case INDEX_op_nor_vec:
+return C_O1_I2(w, w, w);
+
+case INDEX_op_not_vec:
+return C_O1_I1(w, w);
+
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 2f27d05e0c..bf72b26ca2 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -175,13 +175,13 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_v128 use_lsx_instructions
 #define TCG_TARGET_HAS_v256 0
 
-#define TCG_TARGET_HAS_not_vec  0
+#define TCG_TARGET_HAS_not_vec  1
 #define TCG_TARGET_HAS_neg_vec  0
 #define TCG_TARGET_HAS_abs_vec  0
-#define TCG_TARGET_HAS_andc_vec 0
-#define TCG_TARGET_HAS_orc_vec  0
+#define TCG_TARGET_HAS_andc_vec 1
+#define TCG_TARGET_HAS_orc_vec  1
 #define TCG_TARGET_HAS_nand_vec 0
-#define TCG_TARGET_HAS_nor_vec  0
+#define TCG_TARGET_HAS_nor_vec  1
 #define TCG_TARGET_HAS_eqv_vec  0
 #define TCG_TARGET_HAS_mul_vec  0
 #define TCG_TARGET_HAS_shi_vec  0
-- 
2.42.0

[PATCH v2 02/14] tcg/loongarch64: Lower basic tcg vec ops to LSX

2023-09-01 Thread Jiajie Chen

LSX support on host cpu is detected via hwcap.

Lower the following ops to LSX:

- dup_vec
- dupi_vec
- dupm_vec
- ld_vec
- st_vec

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target-con-set.h |   2 +
 tcg/loongarch64/tcg-target-con-str.h |   1 +
 tcg/loongarch64/tcg-target.c.inc | 219 ++-
 tcg/loongarch64/tcg-target.h |  38 -
 tcg/loongarch64/tcg-target.opc.h |  12 ++
 5 files changed, 270 insertions(+), 2 deletions(-)
 create mode 100644 tcg/loongarch64/tcg-target.opc.h

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index c2bde44613..37b3f80bf9 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -17,7 +17,9 @@
 C_O0_I1(r)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
+C_O0_I2(w, r)
 C_O1_I1(r, r)
+C_O1_I1(w, r)
 C_O1_I2(r, r, rC)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
diff --git a/tcg/loongarch64/tcg-target-con-str.h 
b/tcg/loongarch64/tcg-target-con-str.h
index 6e9ccca3ad..81b8d40278 100644
--- a/tcg/loongarch64/tcg-target-con-str.h
+++ b/tcg/loongarch64/tcg-target-con-str.h
@@ -14,6 +14,7 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
+REGS('w', ALL_VECTOR_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index baf5fc3819..150278e112 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -32,6 +32,8 @@
 #include "../tcg-ldst.c.inc"
 #include 
 
+bool use_lsx_instructions;
+
 #ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "zero",
@@ -65,7 +67,39 @@ static const char * const 
tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "s5",
 "s6",
 "s7",
-"s8"
+"s8",
+"vr0",
+"vr1",
+"vr2",
+"vr3",
+"vr4",
+"vr5",
+"vr6",
+"vr7",
+"vr8",
+"vr9",
+"vr10",
+"vr11",
+"vr12",
+"vr13",
+"vr14",
+"vr15",
+"vr16",
+"vr17",
+"vr18",
+"vr19",
+"vr20",
+"vr21",
+"vr22",
+"vr23",
+"vr24",
+"vr25",
+"vr26",
+"vr27",
+"vr28",
+"vr29",
+"vr30",
+"vr31",
 };
 #endif
 
@@ -102,6 +136,15 @@ static const int tcg_target_reg_alloc_order[] = {
 TCG_REG_A2,
 TCG_REG_A1,
 TCG_REG_A0,
+
+/* Vector registers */
+TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3,
+TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7,
+TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11,
+TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
+TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
+/* V24 - V31 are caller-saved, and skipped.  */
 };
 
 static const int tcg_target_call_iarg_regs[] = {
@@ -135,6 +178,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define TCG_CT_CONST_WSZ   0x2000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
+#define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32)
 
 static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len)
 {
@@ -1486,6 +1530,154 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 }
 }
 
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+TCGReg rd, TCGReg rs)
+{
+switch (vece) {
+case MO_8:
+tcg_out_opc_vreplgr2vr_b(s, rd, rs);
+break;
+case MO_16:
+tcg_out_opc_vreplgr2vr_h(s, rd, rs);
+break;
+case MO_32:
+tcg_out_opc_vreplgr2vr_w(s, rd, rs);
+break;
+case MO_64:
+tcg_out_opc_vreplgr2vr_d(s, rd, rs);
+break;
+default:
+g_assert_not_reached();
+}
+return true;
+}
+
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+ TCGReg r, TCGReg base, intptr_t offset)
+{
+/* Handle imm overflow and division (vldrepl.d imm is divided by 8) */
+if (offset < -0x800 || offset > 0x7ff || \
+(offset & ((1 << vece) - 1)) != 0) {
+tcg_out_addi(s, TCG_TYPE_I64, TCG_REG_TMP0, base, offset);
+base = TCG_REG_TMP0;
+offset = 0;
+}
+offset >>= vece;
+
+switch (vece) {
+case MO_8:
+tcg_out_opc_vldrepl_b(s, r, base, offset);
+break;
+case MO_16:
+tcg_out_opc_vldrepl_h(s, r, base, offset);
+break;
+case MO_32:
+tcg_out_opc_vldrepl_w(s, r, base, offset);
+break;
+case MO_64:
+tcg_out_opc_vldrepl_d(s

[PATCH v2 08/14] tcg/loongarch64: Lower vector min max ops

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- smin_vec
- smax_vec
- umin_vec
- umax_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 32 
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 07c030b262..ad1fbf0339 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1659,6 +1659,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn mul_vec_insn[4] = {
 OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D
 };
+static const LoongArchInsn smin_vec_insn[4] = {
+OPC_VMIN_B, OPC_VMIN_H, OPC_VMIN_W, OPC_VMIN_D
+};
+static const LoongArchInsn umin_vec_insn[4] = {
+OPC_VMIN_BU, OPC_VMIN_HU, OPC_VMIN_WU, OPC_VMIN_DU
+};
+static const LoongArchInsn smax_vec_insn[4] = {
+OPC_VMAX_B, OPC_VMAX_H, OPC_VMAX_W, OPC_VMAX_D
+};
+static const LoongArchInsn umax_vec_insn[4] = {
+OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1797,6 +1809,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_mul_vec:
 tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_smin_vec:
+tcg_out32(s, encode_vdvjvk_insn(smin_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_smax_vec:
+tcg_out32(s, encode_vdvjvk_insn(smax_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_umin_vec:
+tcg_out32(s, encode_vdvjvk_insn(umin_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_umax_vec:
+tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1824,6 +1848,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_not_vec:
 case INDEX_op_neg_vec:
 case INDEX_op_mul_vec:
+case INDEX_op_smin_vec:
+case INDEX_op_smax_vec:
+case INDEX_op_umin_vec:
+case INDEX_op_umax_vec:
 return 1;
 default:
 return 0;
@@ -1999,6 +2027,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_xor_vec:
 case INDEX_op_nor_vec:
 case INDEX_op_mul_vec:
+case INDEX_op_smin_vec:
+case INDEX_op_smax_vec:
+case INDEX_op_umin_vec:
+case INDEX_op_umax_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 0880a2903d..2b81a06c89 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -191,7 +191,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
 #define TCG_TARGET_HAS_sat_vec  0
-#define TCG_TARGET_HAS_minmax_vec   0
+#define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   0
 #define TCG_TARGET_HAS_cmpsel_vec   0
 
-- 
2.42.0

[PATCH v2 04/14] tcg/loongarch64: Lower add/sub_vec to vadd/vsub

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- add_vec
- sub_vec

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target-con-set.h |  1 +
 tcg/loongarch64/tcg-target.c.inc | 58 
 2 files changed, 59 insertions(+)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index d04916db25..eaa015e813 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -31,5 +31,6 @@ C_O1_I2(r, 0, rZ)
 C_O1_I2(r, rZ, ri)
 C_O1_I2(r, rZ, rJ)
 C_O1_I2(r, rZ, rZ)
+C_O1_I2(w, w, wi)
 C_O1_I2(w, w, wJ)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 18fe5fc148..555080f2b0 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1641,6 +1641,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 [TCG_COND_LTU] = {OPC_VSLTI_BU, OPC_VSLTI_HU, OPC_VSLTI_WU, 
OPC_VSLTI_DU},
 };
 LoongArchInsn insn;
+static const LoongArchInsn add_vec_insn[4] = {
+OPC_VADD_B, OPC_VADD_H, OPC_VADD_W, OPC_VADD_D
+};
+static const LoongArchInsn add_vec_imm_insn[4] = {
+OPC_VADDI_BU, OPC_VADDI_HU, OPC_VADDI_WU, OPC_VADDI_DU
+};
+static const LoongArchInsn sub_vec_insn[4] = {
+OPC_VSUB_B, OPC_VSUB_H, OPC_VSUB_W, OPC_VSUB_D
+};
+static const LoongArchInsn sub_vec_imm_insn[4] = {
+OPC_VSUBI_BU, OPC_VSUBI_HU, OPC_VSUBI_WU, OPC_VSUBI_DU
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1707,6 +1719,46 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 }
 tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2));
 break;
+case INDEX_op_add_vec:
+if (const_args[2]) {
+int64_t value = sextract64(a2, 0, 8 << vece);
+/* Try vaddi/vsubi */
+if (0 <= value && value <= 0x1f) {
+tcg_out32(s, encode_vdvjuk5_insn(add_vec_imm_insn[vece], a0, \
+ a1, value));
+break;
+} else if (-0x1f <= value && value < 0) {
+tcg_out32(s, encode_vdvjuk5_insn(sub_vec_imm_insn[vece], a0, \
+ a1, -value));
+break;
+}
+
+/* Fallback to dupi + vadd */
+tcg_out_dupi_vec(s, type, vece, temp_vec, a2);
+a2 = temp_vec;
+}
+tcg_out32(s, encode_vdvjvk_insn(add_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sub_vec:
+if (const_args[2]) {
+int64_t value = sextract64(a2, 0, 8 << vece);
+/* Try vaddi/vsubi */
+if (0 <= value && value <= 0x1f) {
+tcg_out32(s, encode_vdvjuk5_insn(sub_vec_imm_insn[vece], a0, \
+ a1, value));
+break;
+} else if (-0x1f <= value && value < 0) {
+tcg_out32(s, encode_vdvjuk5_insn(add_vec_imm_insn[vece], a0, \
+ a1, -value));
+break;
+}
+
+/* Fallback to dupi + vsub */
+tcg_out_dupi_vec(s, type, vece, temp_vec, a2);
+a2 = temp_vec;
+}
+tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1723,6 +1775,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_dup_vec:
 case INDEX_op_dupm_vec:
 case INDEX_op_cmp_vec:
+case INDEX_op_add_vec:
+case INDEX_op_sub_vec:
 return 1;
 default:
 return 0;
@@ -1887,6 +1941,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_cmp_vec:
 return C_O1_I2(w, w, wJ);
 
+case INDEX_op_add_vec:
+case INDEX_op_sub_vec:
+return C_O1_I2(w, w, wi);
+
 default:
 g_assert_not_reached();
 }
-- 
2.42.0

[PATCH v2 07/14] tcg/loongarch64: Lower mul_vec to vmul

2023-09-01 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 8 
 tcg/loongarch64/tcg-target.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 16bcc2cf1b..07c030b262 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1656,6 +1656,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn neg_vec_insn[4] = {
 OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D
 };
+static const LoongArchInsn mul_vec_insn[4] = {
+OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1791,6 +1794,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_neg_vec:
 tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1));
 break;
+case INDEX_op_mul_vec:
+tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1817,6 +1823,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_nor_vec:
 case INDEX_op_not_vec:
 case INDEX_op_neg_vec:
+case INDEX_op_mul_vec:
 return 1;
 default:
 return 0;
@@ -1991,6 +1998,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_orc_vec:
 case INDEX_op_xor_vec:
 case INDEX_op_nor_vec:
+case INDEX_op_mul_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index c992c4b297..0880a2903d 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -183,7 +183,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_nand_vec 0
 #define TCG_TARGET_HAS_nor_vec  1
 #define TCG_TARGET_HAS_eqv_vec  0
-#define TCG_TARGET_HAS_mul_vec  0
+#define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  0
-- 
2.42.0

[PATCH v2 10/14] tcg/loongarch64: Lower vector shift vector ops

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- shlv_vec
- shrv_vec
- sarv_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 24 
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 1e587a82b1..9f02805c4b 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1683,6 +1683,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn ussub_vec_insn[4] = {
 OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU
 };
+static const LoongArchInsn shlv_vec_insn[4] = {
+OPC_VSLL_B, OPC_VSLL_H, OPC_VSLL_W, OPC_VSLL_D
+};
+static const LoongArchInsn shrv_vec_insn[4] = {
+OPC_VSRL_B, OPC_VSRL_H, OPC_VSRL_W, OPC_VSRL_D
+};
+static const LoongArchInsn sarv_vec_insn[4] = {
+OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1845,6 +1854,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ussub_vec:
 tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_shlv_vec:
+tcg_out32(s, encode_vdvjvk_insn(shlv_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_shrv_vec:
+tcg_out32(s, encode_vdvjvk_insn(shrv_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sarv_vec:
+tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1880,6 +1898,9 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_usadd_vec:
 case INDEX_op_sssub_vec:
 case INDEX_op_ussub_vec:
+case INDEX_op_shlv_vec:
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
 return 1;
 default:
 return 0;
@@ -2063,6 +2084,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_usadd_vec:
 case INDEX_op_sssub_vec:
 case INDEX_op_ussub_vec:
+case INDEX_op_shlv_vec:
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 72bfd0d440..d27f3737ad 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -186,7 +186,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
-#define TCG_TARGET_HAS_shv_vec  0
+#define TCG_TARGET_HAS_shv_vec  1
 #define TCG_TARGET_HAS_roti_vec 0
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
-- 
2.42.0

[PATCH v2 00/14] Lower TCG vector ops to LSX

2023-09-01 Thread Jiajie Chen

This patch series allows qemu to utilize LSX instructions on LoongArch
machines to execute TCG vector ops.

Passed tcg tests with x86_64 and aarch64 cross compilers.

Changes since v1:

- Optimize dupi_vec/st_vec/ld_vec/cmp_vec/add_vec/sub_vec generation
- Lower not_vec/shi_vec/roti_vec/rotv_vec

Jiajie Chen (14):
  tcg/loongarch64: Import LSX instructions
  tcg/loongarch64: Lower basic tcg vec ops to LSX
  tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt
  tcg/loongarch64: Lower add/sub_vec to vadd/vsub
  tcg/loongarch64: Lower vector bitwise operations
  tcg/loongarch64: Lower neg_vec to vneg
  tcg/loongarch64: Lower mul_vec to vmul
  tcg/loongarch64: Lower vector min max ops
  tcg/loongarch64: Lower vector saturated ops
  tcg/loongarch64: Lower vector shift vector ops
  tcg/loongarch64: Lower bitsel_vec to vbitsel
  tcg/loongarch64: Lower vector shift integer ops
  tcg/loongarch64: Lower rotv_vec ops to LSX
  tcg/loongarch64: Lower rotli_vec to vrotri

 tcg/loongarch64/tcg-insn-defs.c.inc  | 6251 +-
 tcg/loongarch64/tcg-target-con-set.h |7 +
 tcg/loongarch64/tcg-target-con-str.h |1 +
 tcg/loongarch64/tcg-target.c.inc |  550 ++-
 tcg/loongarch64/tcg-target.h |   38 +-
 tcg/loongarch64/tcg-target.opc.h |   12 +
 6 files changed, 6740 insertions(+), 119 deletions(-)
 create mode 100644 tcg/loongarch64/tcg-target.opc.h

-- 
2.42.0

[PATCH v2 06/14] tcg/loongarch64: Lower neg_vec to vneg

2023-09-01 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 8 
 tcg/loongarch64/tcg-target.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 20e25dc490..16bcc2cf1b 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1653,6 +1653,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn sub_vec_imm_insn[4] = {
 OPC_VSUBI_BU, OPC_VSUBI_HU, OPC_VSUBI_WU, OPC_VSUBI_DU
 };
+static const LoongArchInsn neg_vec_insn[4] = {
+OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1785,6 +1788,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 }
 tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_neg_vec:
+tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1810,6 +1816,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_xor_vec:
 case INDEX_op_nor_vec:
 case INDEX_op_not_vec:
+case INDEX_op_neg_vec:
 return 1;
 default:
 return 0;
@@ -1987,6 +1994,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
+case INDEX_op_neg_vec:
 return C_O1_I1(w, w);
 
 default:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index bf72b26ca2..c992c4b297 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -176,7 +176,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_v256 0
 
 #define TCG_TARGET_HAS_not_vec  1
-#define TCG_TARGET_HAS_neg_vec  0
+#define TCG_TARGET_HAS_neg_vec  1
 #define TCG_TARGET_HAS_abs_vec  0
 #define TCG_TARGET_HAS_andc_vec 1
 #define TCG_TARGET_HAS_orc_vec  1
-- 
2.42.0

[PATCH v2 09/14] tcg/loongarch64: Lower vector saturated ops

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- ssadd_vec
- usadd_vec
- sssub_vec
- ussub_vec

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 32 
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index ad1fbf0339..1e587a82b1 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1671,6 +1671,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn umax_vec_insn[4] = {
 OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU
 };
+static const LoongArchInsn ssadd_vec_insn[4] = {
+OPC_VSADD_B, OPC_VSADD_H, OPC_VSADD_W, OPC_VSADD_D
+};
+static const LoongArchInsn usadd_vec_insn[4] = {
+OPC_VSADD_BU, OPC_VSADD_HU, OPC_VSADD_WU, OPC_VSADD_DU
+};
+static const LoongArchInsn sssub_vec_insn[4] = {
+OPC_VSSUB_B, OPC_VSSUB_H, OPC_VSSUB_W, OPC_VSSUB_D
+};
+static const LoongArchInsn ussub_vec_insn[4] = {
+OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1821,6 +1833,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_umax_vec:
 tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_ssadd_vec:
+tcg_out32(s, encode_vdvjvk_insn(ssadd_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_usadd_vec:
+tcg_out32(s, encode_vdvjvk_insn(usadd_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sssub_vec:
+tcg_out32(s, encode_vdvjvk_insn(sssub_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_ussub_vec:
+tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1852,6 +1876,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_smax_vec:
 case INDEX_op_umin_vec:
 case INDEX_op_umax_vec:
+case INDEX_op_ssadd_vec:
+case INDEX_op_usadd_vec:
+case INDEX_op_sssub_vec:
+case INDEX_op_ussub_vec:
 return 1;
 default:
 return 0;
@@ -2031,6 +2059,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_smax_vec:
 case INDEX_op_umin_vec:
 case INDEX_op_umax_vec:
+case INDEX_op_ssadd_vec:
+case INDEX_op_usadd_vec:
+case INDEX_op_sssub_vec:
+case INDEX_op_ussub_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 2b81a06c89..72bfd0d440 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -190,7 +190,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_roti_vec 0
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
-#define TCG_TARGET_HAS_sat_vec  0
+#define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   0
 #define TCG_TARGET_HAS_cmpsel_vec   0
-- 
2.42.0

[PATCH v2 13/14] tcg/loongarch64: Lower rotv_vec ops to LSX

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- rotrv_vec
- rotlv_vec

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target.c.inc | 14 ++
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index ccb362205e..6fe319a77e 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1701,6 +1701,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn sari_vec_insn[4] = {
 OPC_VSRAI_B, OPC_VSRAI_H, OPC_VSRAI_W, OPC_VSRAI_D
 };
+static const LoongArchInsn rotrv_vec_insn[4] = {
+OPC_VROTR_B, OPC_VROTR_H, OPC_VROTR_W, OPC_VROTR_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1882,6 +1885,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sari_vec:
 tcg_out32(s, encode_vdvjuk3_insn(sari_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_rotrv_vec:
+tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_rotlv_vec:
+/* rotlv_vec a1, a2 = rotrv_vec a1, -a2 */
+tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], temp_vec, a2));
+tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1,
+temp_vec));
+break;
 case INDEX_op_bitsel_vec:
 /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
 tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
@@ -2111,6 +2123,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_shlv_vec:
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
+case INDEX_op_rotrv_vec:
+case INDEX_op_rotlv_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_not_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index b4dab03469..f6eb3cf7a6 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -189,7 +189,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_shv_vec  1
 #define TCG_TARGET_HAS_roti_vec 0
 #define TCG_TARGET_HAS_rots_vec 0
-#define TCG_TARGET_HAS_rotv_vec 0
+#define TCG_TARGET_HAS_rotv_vec 1
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   1
-- 
2.42.0

[PATCH v2 14/14] tcg/loongarch64: Lower rotli_vec to vrotri

2023-09-01 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target.c.inc | 21 +
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 6fe319a77e..c4e9e0309e 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1894,6 +1894,26 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out32(s, encode_vdvjvk_insn(rotrv_vec_insn[vece], a0, a1,
 temp_vec));
 break;
+case INDEX_op_rotli_vec:
+/* rotli_vec a1, a2 = rotri_vec a1, -a2 */
+a2 = extract32(-a2, 0, 3 + vece);
+switch (vece) {
+case MO_8:
+tcg_out_opc_vrotri_b(s, a0, a1, a2);
+break;
+case MO_16:
+tcg_out_opc_vrotri_h(s, a0, a1, a2);
+break;
+case MO_32:
+tcg_out_opc_vrotri_w(s, a0, a1, a2);
+break;
+case MO_64:
+tcg_out_opc_vrotri_d(s, a0, a1, a2);
+break;
+default:
+g_assert_not_reached();
+}
+break;
 case INDEX_op_bitsel_vec:
 /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
 tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
@@ -2132,6 +2152,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_shli_vec:
 case INDEX_op_shri_vec:
 case INDEX_op_sari_vec:
+case INDEX_op_rotli_vec:
 return C_O1_I1(w, w);
 
 case INDEX_op_bitsel_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index f6eb3cf7a6..3dc2dbf800 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -187,7 +187,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  1
-#define TCG_TARGET_HAS_roti_vec 0
+#define TCG_TARGET_HAS_roti_vec 1
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 1
 #define TCG_TARGET_HAS_sat_vec  1
-- 
2.42.0

[PATCH v2 11/14] tcg/loongarch64: Lower bitsel_vec to vbitsel

2023-09-01 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target-con-set.h |  1 +
 tcg/loongarch64/tcg-target.c.inc | 11 ++-
 tcg/loongarch64/tcg-target.h |  2 +-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 13a7f3b5e2..fd2bd785e5 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -35,4 +35,5 @@ C_O1_I2(r, rZ, rZ)
 C_O1_I2(w, w, w)
 C_O1_I2(w, w, wi)
 C_O1_I2(w, w, wJ)
+C_O1_I3(w, w, w, w)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 9f02805c4b..8de4c36396 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1622,7 +1622,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
const int const_args[TCG_MAX_OP_ARGS])
 {
 TCGType type = vecl + TCG_TYPE_V64;
-TCGArg a0, a1, a2;
+TCGArg a0, a1, a2, a3;
 TCGReg temp = TCG_REG_TMP0;
 TCGReg temp_vec = TCG_VEC_TMP0;
 
@@ -1696,6 +1696,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 a0 = args[0];
 a1 = args[1];
 a2 = args[2];
+a3 = args[3];
 
 /* Currently only supports V128 */
 tcg_debug_assert(type == TCG_TYPE_V128);
@@ -1863,6 +1864,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sarv_vec:
 tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_bitsel_vec:
+/* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
+tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1901,6 +1906,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_shlv_vec:
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
+case INDEX_op_bitsel_vec:
 return 1;
 default:
 return 0;
@@ -2093,6 +2099,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_neg_vec:
 return C_O1_I1(w, w);
 
+case INDEX_op_bitsel_vec:
+return C_O1_I3(w, w, w, w);
+
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index d27f3737ad..c77672d92c 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -192,7 +192,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_rotv_vec 0
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
-#define TCG_TARGET_HAS_bitsel_vec   0
+#define TCG_TARGET_HAS_bitsel_vec   1
 #define TCG_TARGET_HAS_cmpsel_vec   0
 
 #define TCG_TARGET_DEFAULT_MO (0)
-- 
2.42.0

[PATCH v2 12/14] tcg/loongarch64: Lower vector shift integer ops

2023-09-01 Thread Jiajie Chen

Lower the following ops:

- shli_vec
- shrv_vec
- sarv_vec

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target.c.inc | 21 +
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 8de4c36396..ccb362205e 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1692,6 +1692,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn sarv_vec_insn[4] = {
 OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D
 };
+static const LoongArchInsn shli_vec_insn[4] = {
+OPC_VSLLI_B, OPC_VSLLI_H, OPC_VSLLI_W, OPC_VSLLI_D
+};
+static const LoongArchInsn shri_vec_insn[4] = {
+OPC_VSRLI_B, OPC_VSRLI_H, OPC_VSRLI_W, OPC_VSRLI_D
+};
+static const LoongArchInsn sari_vec_insn[4] = {
+OPC_VSRAI_B, OPC_VSRAI_H, OPC_VSRAI_W, OPC_VSRAI_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1864,6 +1873,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sarv_vec:
 tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_shli_vec:
+tcg_out32(s, encode_vdvjuk3_insn(shli_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_shri_vec:
+tcg_out32(s, encode_vdvjuk3_insn(shri_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sari_vec:
+tcg_out32(s, encode_vdvjuk3_insn(sari_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_bitsel_vec:
 /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
 tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
@@ -2097,6 +2115,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 
 case INDEX_op_not_vec:
 case INDEX_op_neg_vec:
+case INDEX_op_shli_vec:
+case INDEX_op_shri_vec:
+case INDEX_op_sari_vec:
 return C_O1_I1(w, w);
 
 case INDEX_op_bitsel_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index c77672d92c..b4dab03469 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -184,7 +184,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_nor_vec  1
 #define TCG_TARGET_HAS_eqv_vec  0
 #define TCG_TARGET_HAS_mul_vec  1
-#define TCG_TARGET_HAS_shi_vec  0
+#define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  1
 #define TCG_TARGET_HAS_roti_vec 0
-- 
2.42.0

[PATCH v2 03/14] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt

2023-09-01 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target-con-set.h |  1 +
 tcg/loongarch64/tcg-target.c.inc | 60 
 2 files changed, 61 insertions(+)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 37b3f80bf9..d04916db25 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -31,4 +31,5 @@ C_O1_I2(r, 0, rZ)
 C_O1_I2(r, rZ, ri)
 C_O1_I2(r, rZ, rJ)
 C_O1_I2(r, rZ, rZ)
+C_O1_I2(w, w, wJ)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 150278e112..18fe5fc148 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1624,6 +1624,23 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 TCGType type = vecl + TCG_TYPE_V64;
 TCGArg a0, a1, a2;
 TCGReg temp = TCG_REG_TMP0;
+TCGReg temp_vec = TCG_VEC_TMP0;
+
+static const LoongArchInsn cmp_vec_insn[16][4] = {
+[TCG_COND_EQ] = {OPC_VSEQ_B, OPC_VSEQ_H, OPC_VSEQ_W, OPC_VSEQ_D},
+[TCG_COND_LE] = {OPC_VSLE_B, OPC_VSLE_H, OPC_VSLE_W, OPC_VSLE_D},
+[TCG_COND_LEU] = {OPC_VSLE_BU, OPC_VSLE_HU, OPC_VSLE_WU, OPC_VSLE_DU},
+[TCG_COND_LT] = {OPC_VSLT_B, OPC_VSLT_H, OPC_VSLT_W, OPC_VSLT_D},
+[TCG_COND_LTU] = {OPC_VSLT_BU, OPC_VSLT_HU, OPC_VSLT_WU, OPC_VSLT_DU},
+};
+static const LoongArchInsn cmp_vec_imm_insn[16][4] = {
+[TCG_COND_EQ] = {OPC_VSEQI_B, OPC_VSEQI_H, OPC_VSEQI_W, OPC_VSEQI_D},
+[TCG_COND_LE] = {OPC_VSLEI_B, OPC_VSLEI_H, OPC_VSLEI_W, OPC_VSLEI_D},
+[TCG_COND_LEU] = {OPC_VSLEI_BU, OPC_VSLEI_HU, OPC_VSLEI_WU, 
OPC_VSLEI_DU},
+[TCG_COND_LT] = {OPC_VSLTI_B, OPC_VSLTI_H, OPC_VSLTI_W, OPC_VSLTI_D},
+[TCG_COND_LTU] = {OPC_VSLTI_BU, OPC_VSLTI_HU, OPC_VSLTI_WU, 
OPC_VSLTI_DU},
+};
+LoongArchInsn insn;
 
 a0 = args[0];
 a1 = args[1];
@@ -1651,6 +1668,45 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out_opc_vldx(s, a0, a1, temp);
 }
 break;
+case INDEX_op_cmp_vec:
+TCGCond cond = args[3];
+if (const_args[2]) {
+/*
+ * cmp_vec dest, src, value
+ * Try vseqi/vslei/vslti
+ */
+int64_t value = sextract64(a2, 0, 8 << vece);
+if ((cond == TCG_COND_EQ || cond == TCG_COND_LE || \
+ cond == TCG_COND_LT) && (-0x10 <= value && value <= 0x0f)) {
+tcg_out32(s, encode_vdvjsk5_insn(cmp_vec_imm_insn[cond][vece], 
\
+ a0, a1, value));
+break;
+} else if ((cond == TCG_COND_LEU || cond == TCG_COND_LTU) &&
+(0x00 <= value && value <= 0x1f)) {
+tcg_out32(s, encode_vdvjuk5_insn(cmp_vec_imm_insn[cond][vece], 
\
+ a0, a1, value));
+break;
+}
+
+/*
+ * Fallback to:
+ * dupi_vec temp, a2
+ * cmp_vec a0, a1, temp, cond
+ */
+tcg_out_dupi_vec(s, type, vece, temp_vec, a2);
+a2 = temp_vec;
+}
+
+insn = cmp_vec_insn[cond][vece];
+if (insn == 0) {
+TCGArg t;
+t = a1, a1 = a2, a2 = t;
+cond = tcg_swap_cond(cond);
+insn = cmp_vec_insn[cond][vece];
+tcg_debug_assert(insn != 0);
+}
+tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1666,6 +1722,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_st_vec:
 case INDEX_op_dup_vec:
 case INDEX_op_dupm_vec:
+case INDEX_op_cmp_vec:
 return 1;
 default:
 return 0;
@@ -1827,6 +1884,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_st_vec:
 return C_O0_I2(w, r);
 
+case INDEX_op_cmp_vec:
+return C_O1_I2(w, w, wJ);
+
 default:
 g_assert_not_reached();
 }
-- 
2.42.0

Re: [PATCH 02/11] tcg/loongarch64: Lower basic tcg vec ops to LSX

2023-08-28 Thread Jiajie Chen

There seems to some problem with the email server, try my another email 
address to send this email.



On 2023/8/29 00:57, Richard Henderson wrote:

On 8/28/23 08:19, Jiajie Chen wrote:
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned 
vece,

+ TCGReg rd, int64_t v64)
+{
+    /* Try vldi if imm can fit */
+    if (vece <= MO_32 && (-0x200 <= v64 && v64 <= 0x1FF)) {
+    uint32_t imm = (vece << 10) | ((uint32_t)v64 & 0x3FF);
+    tcg_out_opc_vldi(s, rd, imm);
+    return;
+    }


v64 has the value replicated across 64 bits.
In order to do the comparison above, you'll want

    int64_t vale = sextract64(v64, 0, 8 << vece);
    if (-0x200 <= vale && vale <= 0x1ff)
    ...

Since the only documentation for LSX is qemu's own translator code, 
why are you testing vece <= MO_32?  MO_64 should be available as 
well?  Or is there a bug in trans_vldi()?



Sorry, my mistake. I was messing MO_64 with bit 12 in vldi imm.




It might be nice to leave a to-do for vldi imm bit 12 set, for the 
patterns expanded by vldi_get_value().  In particular, mode == 9 is 
likely to be useful, and modes {1,2,3,5} are easy to test for.




Sure, I was thinking about the complexity of pattern matching on those 
modes, and decided to skip the hard part in the first patch series.






+
+    /* Fallback to vreplgr2vr */
+    tcg_out_movi(s, type, TCG_REG_TMP0, v64);


type is a vector type; you can't use it here.
Correct would be TCG_TYPE_I64.

Better to load vale instead, since that will take fewer insns in 
tcg_out_movi.



Sure.






+static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
+   unsigned vecl, unsigned vece,
+   const TCGArg args[TCG_MAX_OP_ARGS],
+   const int const_args[TCG_MAX_OP_ARGS])
+{
+    TCGType type = vecl + TCG_TYPE_V64;
+    TCGArg a0, a1, a2;
+    TCGReg base;
+    TCGReg temp = TCG_REG_TMP0;
+    int32_t offset;
+
+    a0 = args[0];
+    a1 = args[1];
+    a2 = args[2];
+
+    /* Currently only supports V128 */
+    tcg_debug_assert(type == TCG_TYPE_V128);
+
+    switch (opc) {
+    case INDEX_op_st_vec:
+    /* Try to fit vst imm */
+    if (-0x800 <= a2 && a2 <= 0x7ff) {
+    base = a1;
+    offset = a2;
+    } else {
+    tcg_out_addi(s, TCG_TYPE_I64, temp, a1, a2);
+    base = temp;
+    offset = 0;
+    }
+    tcg_out_opc_vst(s, a0, base, offset);
+    break;
+    case INDEX_op_ld_vec:
+    /* Try to fit vld imm */
+    if (-0x800 <= a2 && a2 <= 0x7ff) {
+    base = a1;
+    offset = a2;
+    } else {
+    tcg_out_addi(s, TCG_TYPE_I64, temp, a1, a2);
+    base = temp;
+    offset = 0;
+    }
+    tcg_out_opc_vld(s, a0, base, offset);


tcg_out_addi has a hole in bits [15:12], and can take an extra insn if 
those bits are set.  Better to load the offset with tcg_out_movi and 
then use VLDX/VSTX instead of VLD/VST.



Sure.





@@ -159,6 +170,30 @@ typedef enum {
  #define TCG_TARGET_HAS_mulsh_i64    1
  #define TCG_TARGET_HAS_qemu_ldst_i128   0
  +#define TCG_TARGET_HAS_v64  0
+#define TCG_TARGET_HAS_v128 use_lsx_instructions
+#define TCG_TARGET_HAS_v256 0


Perhaps reserve for a follow-up, but TCG_TARGET_HAS_v64 can easily be 
supported using the same instructions.


The only difference is load/store, where you could use FLD.D/FST.D to 
load the lower 64-bits of the fp/vector register, or VLDREPL.D to load 
and initialize all bits and VSTELM.D to store the lower 64-bits.


I tend to think the float insns are more flexible, having a larger 
displacement, and the availability of FLDX/FSTX as well.



Sure.





r~

[PATCH 05/11] tcg/loongarch64: Lower vector bitwise operations

2023-08-28 Thread Jiajie Chen

Lower the following ops:

- and_vec
- andc_vec
- or_vec
- orc_vec
- xor_vec
- nor_vec

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target.c.inc | 35 
 tcg/loongarch64/tcg-target.h |  6 +++---
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index eb340a6493..fe741ef045 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1671,6 +1671,29 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 }
 tcg_out_opc_vld(s, a0, base, offset);
 break;
+case INDEX_op_and_vec:
+tcg_out_opc_vand_v(s, a0, a1, a2);
+break;
+case INDEX_op_andc_vec:
+/*
+ * vandn vd, vj, vk: vd = vk & ~vj
+ * andc_vec vd, vj, vk: vd = vj & ~vk
+ * vk and vk are swapped
+ */
+tcg_out_opc_vandn_v(s, a0, a2, a1);
+break;
+case INDEX_op_or_vec:
+tcg_out_opc_vor_v(s, a0, a1, a2);
+break;
+case INDEX_op_orc_vec:
+tcg_out_opc_vorn_v(s, a0, a1, a2);
+break;
+case INDEX_op_xor_vec:
+tcg_out_opc_vxor_v(s, a0, a1, a2);
+break;
+case INDEX_op_nor_vec:
+tcg_out_opc_vnor_v(s, a0, a1, a2);
+break;
 case INDEX_op_cmp_vec:
 TCGCond cond = args[3];
 insn = cmp_vec_insn[cond][vece];
@@ -1707,6 +1730,12 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_cmp_vec:
 case INDEX_op_add_vec:
 case INDEX_op_sub_vec:
+case INDEX_op_and_vec:
+case INDEX_op_andc_vec:
+case INDEX_op_or_vec:
+case INDEX_op_orc_vec:
+case INDEX_op_xor_vec:
+case INDEX_op_nor_vec:
 return 1;
 default:
 return 0;
@@ -1871,6 +1900,12 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_cmp_vec:
 case INDEX_op_add_vec:
 case INDEX_op_sub_vec:
+case INDEX_op_and_vec:
+case INDEX_op_andc_vec:
+case INDEX_op_or_vec:
+case INDEX_op_orc_vec:
+case INDEX_op_xor_vec:
+case INDEX_op_nor_vec:
 return C_O1_I2(w, w, w);
 
 default:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index be9343ded9..4ca685e752 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -177,10 +177,10 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_not_vec  0
 #define TCG_TARGET_HAS_neg_vec  0
 #define TCG_TARGET_HAS_abs_vec  0
-#define TCG_TARGET_HAS_andc_vec 0
-#define TCG_TARGET_HAS_orc_vec  0
+#define TCG_TARGET_HAS_andc_vec 1
+#define TCG_TARGET_HAS_orc_vec  1
 #define TCG_TARGET_HAS_nand_vec 0
-#define TCG_TARGET_HAS_nor_vec  0
+#define TCG_TARGET_HAS_nor_vec  1
 #define TCG_TARGET_HAS_eqv_vec  0
 #define TCG_TARGET_HAS_mul_vec  0
 #define TCG_TARGET_HAS_shi_vec  0
-- 
2.42.0

[PATCH 09/11] tcg/loongarch64: Lower vector saturated ops

2023-08-28 Thread Jiajie Chen

Lower the following ops:

- ssadd_vec
- usadd_vec
- sssub_vec
- ussub_vec

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target.c.inc | 32 
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 91049a80b6..21d2365987 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1656,6 +1656,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn umax_vec_insn[4] = {
 OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU
 };
+static const LoongArchInsn ssadd_vec_insn[4] = {
+OPC_VSADD_B, OPC_VSADD_H, OPC_VSADD_W, OPC_VSADD_D
+};
+static const LoongArchInsn usadd_vec_insn[4] = {
+OPC_VSADD_BU, OPC_VSADD_HU, OPC_VSADD_WU, OPC_VSADD_DU
+};
+static const LoongArchInsn sssub_vec_insn[4] = {
+OPC_VSSUB_B, OPC_VSSUB_H, OPC_VSSUB_W, OPC_VSSUB_D
+};
+static const LoongArchInsn ussub_vec_insn[4] = {
+OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1748,6 +1760,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_umax_vec:
 tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_ssadd_vec:
+tcg_out32(s, encode_vdvjvk_insn(ssadd_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_usadd_vec:
+tcg_out32(s, encode_vdvjvk_insn(usadd_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sssub_vec:
+tcg_out32(s, encode_vdvjvk_insn(sssub_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_ussub_vec:
+tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1778,6 +1802,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_smax_vec:
 case INDEX_op_umin_vec:
 case INDEX_op_umax_vec:
+case INDEX_op_ssadd_vec:
+case INDEX_op_usadd_vec:
+case INDEX_op_sssub_vec:
+case INDEX_op_ussub_vec:
 return 1;
 default:
 return 0;
@@ -1953,6 +1981,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_smax_vec:
 case INDEX_op_umin_vec:
 case INDEX_op_umax_vec:
+case INDEX_op_ssadd_vec:
+case INDEX_op_usadd_vec:
+case INDEX_op_sssub_vec:
+case INDEX_op_ussub_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_neg_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 97af7f8631..4c90a1cf51 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -189,7 +189,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_roti_vec 0
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
-#define TCG_TARGET_HAS_sat_vec  0
+#define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   0
 #define TCG_TARGET_HAS_cmpsel_vec   0
-- 
2.42.0

[PATCH 06/11] tcg/loongarch64: Lower neg_vec to vneg

2023-08-28 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target-con-set.h |  1 +
 tcg/loongarch64/tcg-target.c.inc | 10 ++
 tcg/loongarch64/tcg-target.h |  2 +-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index e80fc7f3f7..9fce856012 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -20,6 +20,7 @@ C_O0_I2(rZ, rZ)
 C_O0_I2(w, r)
 C_O1_I1(r, r)
 C_O1_I1(w, r)
+C_O1_I1(w, w)
 C_O1_I2(r, r, rC)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index fe741ef045..819dcdba77 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1638,6 +1638,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn sub_vec_insn[4] = {
 OPC_VSUB_B, OPC_VSUB_H, OPC_VSUB_W, OPC_VSUB_D
 };
+static const LoongArchInsn neg_vec_insn[4] = {
+OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1712,6 +1715,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sub_vec:
 tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_neg_vec:
+tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1736,6 +1742,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_orc_vec:
 case INDEX_op_xor_vec:
 case INDEX_op_nor_vec:
+case INDEX_op_neg_vec:
 return 1;
 default:
 return 0;
@@ -1908,6 +1915,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_nor_vec:
 return C_O1_I2(w, w, w);
 
+case INDEX_op_neg_vec:
+return C_O1_I1(w, w);
+
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 4ca685e752..6a8147875a 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -175,7 +175,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_v256 0
 
 #define TCG_TARGET_HAS_not_vec  0
-#define TCG_TARGET_HAS_neg_vec  0
+#define TCG_TARGET_HAS_neg_vec  1
 #define TCG_TARGET_HAS_abs_vec  0
 #define TCG_TARGET_HAS_andc_vec 1
 #define TCG_TARGET_HAS_orc_vec  1
-- 
2.42.0

[PATCH 10/11] tcg/loongarch64: Lower vector shift vector ops

2023-08-28 Thread Jiajie Chen

Lower the following ops:

- shlv_vec
- shrv_vec
- sarv_vec

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target.c.inc | 24 
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 21d2365987..caf2a7a563 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1668,6 +1668,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn ussub_vec_insn[4] = {
 OPC_VSSUB_BU, OPC_VSSUB_HU, OPC_VSSUB_WU, OPC_VSSUB_DU
 };
+static const LoongArchInsn shlv_vec_insn[4] = {
+OPC_VSLL_B, OPC_VSLL_H, OPC_VSLL_W, OPC_VSLL_D
+};
+static const LoongArchInsn shrv_vec_insn[4] = {
+OPC_VSRL_B, OPC_VSRL_H, OPC_VSRL_W, OPC_VSRL_D
+};
+static const LoongArchInsn sarv_vec_insn[4] = {
+OPC_VSRA_B, OPC_VSRA_H, OPC_VSRA_W, OPC_VSRA_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1772,6 +1781,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ussub_vec:
 tcg_out32(s, encode_vdvjvk_insn(ussub_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_shlv_vec:
+tcg_out32(s, encode_vdvjvk_insn(shlv_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_shrv_vec:
+tcg_out32(s, encode_vdvjvk_insn(shrv_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sarv_vec:
+tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1806,6 +1824,9 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_usadd_vec:
 case INDEX_op_sssub_vec:
 case INDEX_op_ussub_vec:
+case INDEX_op_shlv_vec:
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
 return 1;
 default:
 return 0;
@@ -1985,6 +2006,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_usadd_vec:
 case INDEX_op_sssub_vec:
 case INDEX_op_ussub_vec:
+case INDEX_op_shlv_vec:
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_neg_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 4c90a1cf51..771545b021 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -185,7 +185,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
-#define TCG_TARGET_HAS_shv_vec  0
+#define TCG_TARGET_HAS_shv_vec  1
 #define TCG_TARGET_HAS_roti_vec 0
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
-- 
2.42.0

[PATCH 02/11] tcg/loongarch64: Lower basic tcg vec ops to LSX

2023-08-28 Thread Jiajie Chen

LSX support on host cpu is detected via hwcap.

Lower the following ops to LSX:

- dup_vec
- dupi_vec
- dupm_vec
- ld_vec
- st_vec

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target-con-set.h |   2 +
 tcg/loongarch64/tcg-target-con-str.h |   1 +
 tcg/loongarch64/tcg-target.c.inc | 223 ++-
 tcg/loongarch64/tcg-target.h |  37 -
 tcg/loongarch64/tcg-target.opc.h |  12 ++
 5 files changed, 273 insertions(+), 2 deletions(-)
 create mode 100644 tcg/loongarch64/tcg-target.opc.h

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index c2bde44613..37b3f80bf9 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -17,7 +17,9 @@
 C_O0_I1(r)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
+C_O0_I2(w, r)
 C_O1_I1(r, r)
+C_O1_I1(w, r)
 C_O1_I2(r, r, rC)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
diff --git a/tcg/loongarch64/tcg-target-con-str.h 
b/tcg/loongarch64/tcg-target-con-str.h
index 6e9ccca3ad..81b8d40278 100644
--- a/tcg/loongarch64/tcg-target-con-str.h
+++ b/tcg/loongarch64/tcg-target-con-str.h
@@ -14,6 +14,7 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
+REGS('w', ALL_VECTOR_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index baf5fc3819..0f9427572c 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -32,6 +32,8 @@
 #include "../tcg-ldst.c.inc"
 #include 
 
+bool use_lsx_instructions;
+
 #ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "zero",
@@ -65,7 +67,39 @@ static const char * const 
tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 "s5",
 "s6",
 "s7",
-"s8"
+"s8",
+"vr0",
+"vr1",
+"vr2",
+"vr3",
+"vr4",
+"vr5",
+"vr6",
+"vr7",
+"vr8",
+"vr9",
+"vr10",
+"vr11",
+"vr12",
+"vr13",
+"vr14",
+"vr15",
+"vr16",
+"vr17",
+"vr18",
+"vr19",
+"vr20",
+"vr21",
+"vr22",
+"vr23",
+"vr24",
+"vr25",
+"vr26",
+"vr27",
+"vr28",
+"vr29",
+"vr30",
+"vr31",
 };
 #endif
 
@@ -102,6 +136,15 @@ static const int tcg_target_reg_alloc_order[] = {
 TCG_REG_A2,
 TCG_REG_A1,
 TCG_REG_A0,
+
+/* Vector registers */
+TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3,
+TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7,
+TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11,
+TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
+TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
+/* V24 - V31 are caller-saved, and skipped.  */
 };
 
 static const int tcg_target_call_iarg_regs[] = {
@@ -135,6 +178,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define TCG_CT_CONST_WSZ   0x2000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
+#define ALL_VECTOR_REGSMAKE_64BIT_MASK(32, 32)
 
 static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len)
 {
@@ -1486,6 +1530,159 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 }
 }
 
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+TCGReg rd, TCGReg rs)
+{
+switch (vece) {
+case MO_8:
+tcg_out_opc_vreplgr2vr_b(s, rd, rs);
+break;
+case MO_16:
+tcg_out_opc_vreplgr2vr_h(s, rd, rs);
+break;
+case MO_32:
+tcg_out_opc_vreplgr2vr_w(s, rd, rs);
+break;
+case MO_64:
+tcg_out_opc_vreplgr2vr_d(s, rd, rs);
+break;
+default:
+g_assert_not_reached();
+}
+return true;
+}
+
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+ TCGReg r, TCGReg base, intptr_t offset)
+{
+/* Handle imm overflow and division (vldrepl.d imm is divided by 8) */
+if (offset < -0x800 || offset > 0x7ff || \
+(offset & ((1 << vece) - 1)) != 0) {
+tcg_out_addi(s, TCG_TYPE_I64, TCG_REG_TMP0, base, offset);
+base = TCG_REG_TMP0;
+offset = 0;
+}
+offset >>= vece;
+
+switch (vece) {
+case MO_8:
+tcg_out_opc_vldrepl_b(s, r, base, offset);
+break;
+case MO_16:
+tcg_out_opc_vldrepl_h(s, r, base, offset);
+break;
+case MO_32:
+tcg_out_opc_vldrepl_w(s, r, base, offset);
+break;
+case MO_64:
+tcg_out_opc_

[PATCH 11/11] tcg/loongarch64: Lower bitsel_vec to vbitsel

2023-08-28 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target-con-set.h |  1 +
 tcg/loongarch64/tcg-target.c.inc | 11 ++-
 tcg/loongarch64/tcg-target.h |  2 +-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 9fce856012..0f709113f0 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -33,4 +33,5 @@ C_O1_I2(r, rZ, ri)
 C_O1_I2(r, rZ, rJ)
 C_O1_I2(r, rZ, rZ)
 C_O1_I2(w, w, w)
+C_O1_I3(w, w, w, w)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index caf2a7a563..14826fad5a 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1619,7 +1619,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
const int const_args[TCG_MAX_OP_ARGS])
 {
 TCGType type = vecl + TCG_TYPE_V64;
-TCGArg a0, a1, a2;
+TCGArg a0, a1, a2, a3;
 TCGReg base;
 TCGReg temp = TCG_REG_TMP0;
 int32_t offset;
@@ -1681,6 +1681,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 a0 = args[0];
 a1 = args[1];
 a2 = args[2];
+a3 = args[3];
 
 /* Currently only supports V128 */
 tcg_debug_assert(type == TCG_TYPE_V128);
@@ -1790,6 +1791,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sarv_vec:
 tcg_out32(s, encode_vdvjvk_insn(sarv_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_bitsel_vec:
+/* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
+tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1827,6 +1832,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_shlv_vec:
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
+case INDEX_op_bitsel_vec:
 return 1;
 default:
 return 0;
@@ -2014,6 +2020,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_neg_vec:
 return C_O1_I1(w, w);
 
+case INDEX_op_bitsel_vec:
+return C_O1_I3(w, w, w, w);
+
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 771545b021..aafd770356 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -191,7 +191,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_rotv_vec 0
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
-#define TCG_TARGET_HAS_bitsel_vec   0
+#define TCG_TARGET_HAS_bitsel_vec   1
 #define TCG_TARGET_HAS_cmpsel_vec   0
 
 #define TCG_TARGET_DEFAULT_MO (0)
-- 
2.42.0

[PATCH 08/11] tcg/loongarch64: Lower vector min max ops

2023-08-28 Thread Jiajie Chen

Lower the following ops:

- smin_vec
- smax_vec
- umin_vec
- umax_vec

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target.c.inc | 32 
 tcg/loongarch64/tcg-target.h |  2 +-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index bca24b6a20..91049a80b6 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1644,6 +1644,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn mul_vec_insn[4] = {
 OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D
 };
+static const LoongArchInsn smin_vec_insn[4] = {
+OPC_VMIN_B, OPC_VMIN_H, OPC_VMIN_W, OPC_VMIN_D
+};
+static const LoongArchInsn umin_vec_insn[4] = {
+OPC_VMIN_BU, OPC_VMIN_HU, OPC_VMIN_WU, OPC_VMIN_DU
+};
+static const LoongArchInsn smax_vec_insn[4] = {
+OPC_VMAX_B, OPC_VMAX_H, OPC_VMAX_W, OPC_VMAX_D
+};
+static const LoongArchInsn umax_vec_insn[4] = {
+OPC_VMAX_BU, OPC_VMAX_HU, OPC_VMAX_WU, OPC_VMAX_DU
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1724,6 +1736,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_mul_vec:
 tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2));
 break;
+case INDEX_op_smin_vec:
+tcg_out32(s, encode_vdvjvk_insn(smin_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_smax_vec:
+tcg_out32(s, encode_vdvjvk_insn(smax_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_umin_vec:
+tcg_out32(s, encode_vdvjvk_insn(umin_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_umax_vec:
+tcg_out32(s, encode_vdvjvk_insn(umax_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1750,6 +1774,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_nor_vec:
 case INDEX_op_neg_vec:
 case INDEX_op_mul_vec:
+case INDEX_op_smin_vec:
+case INDEX_op_smax_vec:
+case INDEX_op_umin_vec:
+case INDEX_op_umax_vec:
 return 1;
 default:
 return 0;
@@ -1921,6 +1949,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_xor_vec:
 case INDEX_op_nor_vec:
 case INDEX_op_mul_vec:
+case INDEX_op_smin_vec:
+case INDEX_op_smax_vec:
+case INDEX_op_umin_vec:
+case INDEX_op_umax_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_neg_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 6b97abcb5b..97af7f8631 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -190,7 +190,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
 #define TCG_TARGET_HAS_sat_vec  0
-#define TCG_TARGET_HAS_minmax_vec   0
+#define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   0
 #define TCG_TARGET_HAS_cmpsel_vec   0
 
-- 
2.42.0

[PATCH 03/11] tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt

2023-08-28 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target-con-set.h |  1 +
 tcg/loongarch64/tcg-target.c.inc | 25 +
 2 files changed, 26 insertions(+)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 37b3f80bf9..e80fc7f3f7 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -31,4 +31,5 @@ C_O1_I2(r, 0, rZ)
 C_O1_I2(r, rZ, ri)
 C_O1_I2(r, rZ, rJ)
 C_O1_I2(r, rZ, rZ)
+C_O1_I2(w, w, w)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 0f9427572c..cc80e5fa20 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1624,6 +1624,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 TCGReg temp = TCG_REG_TMP0;
 int32_t offset;
 
+static const LoongArchInsn cmp_vec_insn[16][4] = {
+[TCG_COND_EQ] = {OPC_VSEQ_B, OPC_VSEQ_H, OPC_VSEQ_W, OPC_VSEQ_D},
+[TCG_COND_LE] = {OPC_VSLE_B, OPC_VSLE_H, OPC_VSLE_W, OPC_VSLE_D},
+[TCG_COND_LEU] = {OPC_VSLE_BU, OPC_VSLE_HU, OPC_VSLE_WU, OPC_VSLE_DU},
+[TCG_COND_LT] = {OPC_VSLT_B, OPC_VSLT_H, OPC_VSLT_W, OPC_VSLT_D},
+[TCG_COND_LTU] = {OPC_VSLT_BU, OPC_VSLT_HU, OPC_VSLT_WU, OPC_VSLT_DU},
+};
+LoongArchInsn insn;
+
 a0 = args[0];
 a1 = args[1];
 a2 = args[2];
@@ -1656,6 +1665,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 }
 tcg_out_opc_vld(s, a0, base, offset);
 break;
+case INDEX_op_cmp_vec:
+TCGCond cond = args[3];
+insn = cmp_vec_insn[cond][vece];
+if (insn == 0) {
+TCGArg t;
+t = a1, a1 = a2, a2 = t;
+cond = tcg_swap_cond(cond);
+insn = cmp_vec_insn[cond][vece];
+tcg_debug_assert(insn != 0);
+}
+tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1671,6 +1692,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_st_vec:
 case INDEX_op_dup_vec:
 case INDEX_op_dupm_vec:
+case INDEX_op_cmp_vec:
 return 1;
 default:
 return 0;
@@ -1832,6 +1854,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_st_vec:
 return C_O0_I2(w, r);
 
+case INDEX_op_cmp_vec:
+return C_O1_I2(w, w, w);
+
 default:
 g_assert_not_reached();
 }
-- 
2.42.0

[PATCH 07/11] tcg/loongarch64: Lower mul_vec to vmul

2023-08-28 Thread Jiajie Chen

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target.c.inc | 8 
 tcg/loongarch64/tcg-target.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 819dcdba77..bca24b6a20 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1641,6 +1641,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 static const LoongArchInsn neg_vec_insn[4] = {
 OPC_VNEG_B, OPC_VNEG_H, OPC_VNEG_W, OPC_VNEG_D
 };
+static const LoongArchInsn mul_vec_insn[4] = {
+OPC_VMUL_B, OPC_VMUL_H, OPC_VMUL_W, OPC_VMUL_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1718,6 +1721,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_neg_vec:
 tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1));
 break;
+case INDEX_op_mul_vec:
+tcg_out32(s, encode_vdvjvk_insn(mul_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1743,6 +1749,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_xor_vec:
 case INDEX_op_nor_vec:
 case INDEX_op_neg_vec:
+case INDEX_op_mul_vec:
 return 1;
 default:
 return 0;
@@ -1913,6 +1920,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_orc_vec:
 case INDEX_op_xor_vec:
 case INDEX_op_nor_vec:
+case INDEX_op_mul_vec:
 return C_O1_I2(w, w, w);
 
 case INDEX_op_neg_vec:
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 6a8147875a..6b97abcb5b 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -182,7 +182,7 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_nand_vec 0
 #define TCG_TARGET_HAS_nor_vec  1
 #define TCG_TARGET_HAS_eqv_vec  0
-#define TCG_TARGET_HAS_mul_vec  0
+#define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  0
-- 
2.42.0

[PATCH 00/11] Lower TCG vector ops to LSX

2023-08-28 Thread Jiajie Chen

This patch series allows qemu to utilize LSX instructions on LoongArch
machines to execute TCG vector ops.

Jiajie Chen (11):
  tcg/loongarch64: Import LSX instructions
  tcg/loongarch64: Lower basic tcg vec ops to LSX
  tcg/loongarch64: Lower cmp_vec to vseq/vsle/vslt
  tcg/loongarch64: Lower add/sub_vec to vadd/vsub
  tcg/loongarch64: Lower vector bitwise operations
  tcg/loongarch64: Lower neg_vec to vneg
  tcg/loongarch64: Lower mul_vec to vmul
  tcg/loongarch64: Lower vector min max ops
  tcg/loongarch64: Lower vector saturated ops
  tcg/loongarch64: Lower vector shift vector ops
  tcg/loongarch64: Lower bitsel_vec to vbitsel

 tcg/loongarch64/tcg-insn-defs.c.inc  | 6251 +-
 tcg/loongarch64/tcg-target-con-set.h |5 +
 tcg/loongarch64/tcg-target-con-str.h |1 +
 tcg/loongarch64/tcg-target.c.inc |  414 +-
 tcg/loongarch64/tcg-target.h |   37 +-
 tcg/loongarch64/tcg-target.opc.h |   12 +
 6 files changed, 6601 insertions(+), 119 deletions(-)
 create mode 100644 tcg/loongarch64/tcg-target.opc.h

-- 
2.42.0

[PATCH 04/11] tcg/loongarch64: Lower add/sub_vec to vadd/vsub

2023-08-28 Thread Jiajie Chen

Lower the following ops:

- add_vec
- sub_vec

Signed-off-by: Jiajie Chen 
---
 tcg/loongarch64/tcg-target.c.inc | 16 
 1 file changed, 16 insertions(+)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index cc80e5fa20..eb340a6493 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1632,6 +1632,12 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 [TCG_COND_LTU] = {OPC_VSLT_BU, OPC_VSLT_HU, OPC_VSLT_WU, OPC_VSLT_DU},
 };
 LoongArchInsn insn;
+static const LoongArchInsn add_vec_insn[4] = {
+OPC_VADD_B, OPC_VADD_H, OPC_VADD_W, OPC_VADD_D
+};
+static const LoongArchInsn sub_vec_insn[4] = {
+OPC_VSUB_B, OPC_VSUB_H, OPC_VSUB_W, OPC_VSUB_D
+};
 
 a0 = args[0];
 a1 = args[1];
@@ -1677,6 +1683,12 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 }
 tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2));
 break;
+case INDEX_op_add_vec:
+tcg_out32(s, encode_vdvjvk_insn(add_vec_insn[vece], a0, a1, a2));
+break;
+case INDEX_op_sub_vec:
+tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2));
+break;
 case INDEX_op_dupm_vec:
 tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 break;
@@ -1693,6 +1705,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_dup_vec:
 case INDEX_op_dupm_vec:
 case INDEX_op_cmp_vec:
+case INDEX_op_add_vec:
+case INDEX_op_sub_vec:
 return 1;
 default:
 return 0;
@@ -1855,6 +1869,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 return C_O0_I2(w, r);
 
 case INDEX_op_cmp_vec:
+case INDEX_op_add_vec:
+case INDEX_op_sub_vec:
 return C_O1_I2(w, w, w);
 
 default:
-- 
2.42.0

Re: [PATCH] hw/loongarch: Fix ACPI processor id off-by-one error

2023-08-20 Thread Jiajie Chen



On 2023/8/21 09:24, bibo mao wrote:

+ Add xianglai

Good catch.

In theory, it is logical id, and it can be not equal to physical id.
However it must be equal to _UID in cpu dsdt table which is missing
now.


Yes, the logical id can be different from index. The spec says:

If the processor structure represents an actual processor, this field 
must match the value of ACPI processor ID field in the processor’s entry 
in the MADT. If the processor structure represents a group of associated 
processors, the structure might match a processor container in the name 
space. In that case this entry will match the value of the _UID method 
of the associated processor container. Where there is a match it must be 
represented. The flags field, described in/Processor Structure Flags/, 
includes a bit to describe whether the ACPI processor ID is valid.


I believe PPTT, MADT and DSDT should all adhere to the same logical id 
mapping.




Can pptt table parse error be fixed if cpu dsdt table is added?

Regards
Bibo Mao


在 2023/8/20 18:56, Jiajie Chen 写道:

In hw/acpi/aml-build.c:build_pptt() function, the code assumes that the
ACPI processor id equals to the cpu index, for example if we have 8
cpus, then the ACPI processor id should be in range 0-7.

However, in hw/loongarch/acpi-build.c:build_madt() function we broke the
assumption. If we have 8 cpus again, the ACPI processor id in MADT table
would be in range 1-8. It violates the following description taken from
ACPI spec 6.4 table 5.138:

If the processor structure represents an actual processor, this field
must match the value of ACPI processor ID field in the processor’s entry
in the MADT.

It will break the latest Linux 6.5-rc6 with the
following error message:

ACPI PPTT: PPTT table found, but unable to locate core 7 (8)
Invalid BIOS PPTT

Here 7 is the last cpu index, 8 is the ACPI processor id learned from
MADT.

With this patch, Linux can properly detect SMT threads when "-smp
8,sockets=1,cores=4,threads=2" is passed:

Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):   2

The detection of number of sockets is still wrong, but that is out of
scope of the commit.

Signed-off-by: Jiajie Chen
---
  hw/loongarch/acpi-build.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index 0b62c3a2f7..ae292fc543 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -127,7 +127,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
LoongArchMachineState *lams)
  build_append_int_noprefix(table_data, 17, 1);/* Type */
  build_append_int_noprefix(table_data, 15, 1);/* Length */
  build_append_int_noprefix(table_data, 1, 1); /* Version */
-build_append_int_noprefix(table_data, i + 1, 4); /* ACPI Processor ID 
*/
+build_append_int_noprefix(table_data, i, 4); /* ACPI Processor ID 
*/
  build_append_int_noprefix(table_data, arch_id, 4); /* Core ID */
  build_append_int_noprefix(table_data, 1, 4); /* Flags */
  }

[PATCH] hw/loongarch: Fix ACPI processor id off-by-one error

2023-08-20 Thread Jiajie Chen

In hw/acpi/aml-build.c:build_pptt() function, the code assumes that the
ACPI processor id equals to the cpu index, for example if we have 8
cpus, then the ACPI processor id should be in range 0-7.

However, in hw/loongarch/acpi-build.c:build_madt() function we broke the
assumption. If we have 8 cpus again, the ACPI processor id in MADT table
would be in range 1-8. It violates the following description taken from
ACPI spec 6.4 table 5.138:

If the processor structure represents an actual processor, this field
must match the value of ACPI processor ID field in the processor’s entry
in the MADT.

It will break the latest Linux 6.5-rc6 with the
following error message:

ACPI PPTT: PPTT table found, but unable to locate core 7 (8)
Invalid BIOS PPTT

Here 7 is the last cpu index, 8 is the ACPI processor id learned from
MADT.

With this patch, Linux can properly detect SMT threads when "-smp
8,sockets=1,cores=4,threads=2" is passed:

Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):   2

The detection of number of sockets is still wrong, but that is out of
scope of the commit.

Signed-off-by: Jiajie Chen 
---
 hw/loongarch/acpi-build.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index 0b62c3a2f7..ae292fc543 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -127,7 +127,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
LoongArchMachineState *lams)
 build_append_int_noprefix(table_data, 17, 1);/* Type */
 build_append_int_noprefix(table_data, 15, 1);/* Length */
 build_append_int_noprefix(table_data, 1, 1); /* Version */
-build_append_int_noprefix(table_data, i + 1, 4); /* ACPI Processor ID 
*/
+build_append_int_noprefix(table_data, i, 4); /* ACPI Processor ID 
*/
 build_append_int_noprefix(table_data, arch_id, 4); /* Core ID */
 build_append_int_noprefix(table_data, 1, 4); /* Flags */
 }
-- 
2.41.0

Re: [PATCH] roms: Support compile the efi bios for loongarch

2023-08-10 Thread Jiajie Chen


On 2023/8/10 15:42, xianglai li wrote:

1.Add edk2-platform submodule
2.Added loongarch UEFI BIOS support to compiled scripts.
3.The cross-compilation toolchain on x86 can be obtained from the link below:
https://github.com/loongson/build-tools/tree/2022.09.06

Cc: Paolo Bonzini 
Cc: "Marc-André Lureau" 
Cc: "Daniel P. Berrangé" 
Cc: Thomas Huth 
Cc: "Philippe Mathieu-Daudé" 
Cc: Gerd Hoffmann 
Cc: Xiaojuan Yang 
Cc: Song Gao 
Cc: Bibo Mao 
Signed-off-by: xianglai li 
---
  .gitmodules|  3 +++
  meson.build|  2 +-
  pc-bios/meson.build|  2 ++
  roms/edk2-build.config | 14 ++
  roms/edk2-build.py |  4 ++--
  roms/edk2-platforms|  1 +
  6 files changed, 23 insertions(+), 3 deletions(-)
  create mode 16 roms/edk2-platforms

diff --git a/.gitmodules b/.gitmodules
index 73cae4cd4d..0cb57123fa 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -43,3 +43,6 @@
  [submodule "tests/lcitool/libvirt-ci"]
path = tests/lcitool/libvirt-ci
url = https://gitlab.com/libvirt/libvirt-ci.git
+[submodule "roms/edk2-platforms"]
+   path = roms/edk2-platforms
+   url = https://github.com/tianocore/edk2-platforms.git
diff --git a/meson.build b/meson.build
index 98e68ef0b1..b398caf2ce 100644
--- a/meson.build
+++ b/meson.build
@@ -153,7 +153,7 @@ if targetos != 'darwin'
modular_tcg = ['i386-softmmu', 'x86_64-softmmu']
  endif
  
-edk2_targets = [ 'arm-softmmu', 'aarch64-softmmu', 'i386-softmmu', 'x86_64-softmmu' ]

+edk2_targets = [ 'arm-softmmu', 'aarch64-softmmu', 'i386-softmmu', 
'x86_64-softmmu', 'loongarch64-softmmu' ]
  unpack_edk2_blobs = false
  foreach target : edk2_targets
if target in target_dirs
diff --git a/pc-bios/meson.build b/pc-bios/meson.build
index a7224ef469..fc73222b6c 100644
--- a/pc-bios/meson.build
+++ b/pc-bios/meson.build
@@ -9,6 +9,8 @@ if unpack_edk2_blobs
  'edk2-i386-vars.fd',
  'edk2-x86_64-code.fd',
  'edk2-x86_64-secure-code.fd',
+'edk2-loongarch64-code.fd',
+'edk2-loongarch64-vars.fd',
]
  
foreach f : fds

diff --git a/roms/edk2-build.config b/roms/edk2-build.config
index 66ef9ffcb9..7960c4c2c5 100644
--- a/roms/edk2-build.config
+++ b/roms/edk2-build.config
@@ -1,5 +1,6 @@
  [global]
  core = edk2
+pkgs = edk2-platforms
  
  

  # options
@@ -122,3 +123,16 @@ plat = RiscVVirtQemu
  dest = ../pc-bios
  cpy1 = FV/RISCV_VIRT.fd  edk2-riscv.fd
  pad1 = edk2-riscv.fd 32m
+
+
+# LoongArch64
+
+[build.loongach64.qemu]


typo: s/loongach64/loongarch64/


+conf = Platform/Loongson/LoongArchQemuPkg/Loongson.dsc
+arch = LOONGARCH64
+plat = LoongArchQemu
+dest = ../pc-bios
+cpy1 = FV/QEMU_EFI.fd  edk2-loongarch64-code.fd
+pad1 = edk2-loongarch64-code.fd 4m
+cpy2 = FV/QEMU_VARS.fd  edk2-loongarch64-vars.fd
+pad2 = edk2-loongarch64-vars.fd 16m
diff --git a/roms/edk2-build.py b/roms/edk2-build.py
index 870893f7c8..dbd641e51e 100755
--- a/roms/edk2-build.py
+++ b/roms/edk2-build.py
@@ -269,8 +269,8 @@ def prepare_env(cfg):
  # for cross builds
  if binary_exists('arm-linux-gnu-gcc'):
  os.environ['GCC5_ARM_PREFIX'] = 'arm-linux-gnu-'
-if binary_exists('loongarch64-linux-gnu-gcc'):
-os.environ['GCC5_LOONGARCH64_PREFIX'] = 'loongarch64-linux-gnu-'
+if binary_exists('loongarch64-unknown-linux-gnu-gcc'):
+os.environ['GCC5_LOONGARCH64_PREFIX'] = 
'loongarch64-unknown-linux-gnu-'
  
  hostarch = os.uname().machine

  if binary_exists('aarch64-linux-gnu-gcc') and hostarch != 'aarch64':
diff --git a/roms/edk2-platforms b/roms/edk2-platforms
new file mode 16
index 00..84ccada592
--- /dev/null
+++ b/roms/edk2-platforms
@@ -0,0 +1 @@
+Subproject commit 84ccada59257a8151a592a416017fbb03b8ed3cf

[PATCH v5 09/11] target/loongarch: Truncate high 32 bits of address in VA32 mode

2023-08-09 Thread Jiajie Chen

When running in VA32 mode(!LA64 or VA32L[1-3] matching PLV), virtual
address is truncated to 32 bits before address mapping.

Signed-off-by: Jiajie Chen 
Co-authored-by: Richard Henderson 
---
 target/loongarch/cpu.c| 16 
 target/loongarch/cpu.h|  9 +
 target/loongarch/gdbstub.c|  2 +-
 .../loongarch/insn_trans/trans_atomic.c.inc   |  5 ++-
 .../loongarch/insn_trans/trans_branch.c.inc   |  3 +-
 .../loongarch/insn_trans/trans_fmemory.c.inc  | 30 ---
 target/loongarch/insn_trans/trans_lsx.c.inc   | 38 +--
 .../loongarch/insn_trans/trans_memory.c.inc   | 34 +
 target/loongarch/op_helper.c  |  4 +-
 target/loongarch/translate.c  | 32 
 10 files changed, 85 insertions(+), 88 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 30dd70571a..bd980790f2 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -81,7 +81,7 @@ static void loongarch_cpu_set_pc(CPUState *cs, vaddr value)
 LoongArchCPU *cpu = LOONGARCH_CPU(cs);
 CPULoongArchState *env = >env;
 
-env->pc = value;
+set_pc(env, value);
 }
 
 static vaddr loongarch_cpu_get_pc(CPUState *cs)
@@ -168,7 +168,7 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
 set_DERA:
 env->CSR_DERA = env->pc;
 env->CSR_DBG = FIELD_DP64(env->CSR_DBG, CSR_DBG, DST, 1);
-env->pc = env->CSR_EENTRY + 0x480;
+set_pc(env, env->CSR_EENTRY + 0x480);
 break;
 case EXCCODE_INT:
 if (FIELD_EX64(env->CSR_DBG, CSR_DBG, DST)) {
@@ -249,7 +249,8 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
 
 /* Find the highest-priority interrupt. */
 vector = 31 - clz32(pending);
-env->pc = env->CSR_EENTRY + (EXCCODE_EXTERNAL_INT + vector) * vec_size;
+set_pc(env, env->CSR_EENTRY + \
+   (EXCCODE_EXTERNAL_INT + vector) * vec_size);
 qemu_log_mask(CPU_LOG_INT,
   "%s: PC " TARGET_FMT_lx " ERA " TARGET_FMT_lx
   " cause %d\n" "A " TARGET_FMT_lx " D "
@@ -260,10 +261,9 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
   env->CSR_ECFG, env->CSR_ESTAT);
 } else {
 if (tlbfill) {
-env->pc = env->CSR_TLBRENTRY;
+set_pc(env, env->CSR_TLBRENTRY);
 } else {
-env->pc = env->CSR_EENTRY;
-env->pc += EXCODE_MCODE(cause) * vec_size;
+set_pc(env, env->CSR_EENTRY + EXCODE_MCODE(cause) * vec_size);
 }
 qemu_log_mask(CPU_LOG_INT,
   "%s: PC " TARGET_FMT_lx " ERA " TARGET_FMT_lx
@@ -324,7 +324,7 @@ static void loongarch_cpu_synchronize_from_tb(CPUState *cs,
 CPULoongArchState *env = >env;
 
 tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
-env->pc = tb->pc;
+set_pc(env, tb->pc);
 }
 
 static void loongarch_restore_state_to_opc(CPUState *cs,
@@ -334,7 +334,7 @@ static void loongarch_restore_state_to_opc(CPUState *cs,
 LoongArchCPU *cpu = LOONGARCH_CPU(cs);
 CPULoongArchState *env = >env;
 
-env->pc = data[0];
+set_pc(env, data[0]);
 }
 #endif /* CONFIG_TCG */
 
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 0e02257f91..9f550793ca 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -442,6 +442,15 @@ static inline bool is_va32(CPULoongArchState *env)
 return va32;
 }
 
+static inline void set_pc(CPULoongArchState *env, uint64_t value)
+{
+if (is_va32(env)) {
+env->pc = (uint32_t)value;
+} else {
+env->pc = value;
+}
+}
+
 /*
  * LoongArch CPUs hardware flags.
  */
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index a462e25737..e20b20f99b 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -77,7 +77,7 @@ int loongarch_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 env->gpr[n] = tmp;
 length = read_length;
 } else if (n == 33) {
-env->pc = tmp;
+set_pc(env, tmp);
 length = read_length;
 }
 return length;
diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc 
b/target/loongarch/insn_trans/trans_atomic.c.inc
index c69f31bc78..d90312729b 100644
--- a/target/loongarch/insn_trans/trans_atomic.c.inc
+++ b/target/loongarch/insn_trans/trans_atomic.c.inc
@@ -7,9 +7,8 @@ static bool gen_ll(DisasContext *ctx, arg_rr_i *a, MemOp mop)
 {
 TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
 TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
-TCGv t0 = tcg_temp_new();
+TCGv t0 = make_address_i(ctx, src1, a->imm);
 
-tcg_gen_addi_tl(t0, src1, a->imm);
 tcg_gen_qemu_ld_i64(dest, t0, ctx->mem_idx, mop);

[PATCH v5 10/11] target/loongarch: Sign extend results in VA32 mode

2023-08-09 Thread Jiajie Chen

In VA32 mode, BL, JIRL and PC* instructions should sign-extend the low
32 bit result to 64 bits.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/insn_trans/trans_arith.c.inc  | 2 +-
 target/loongarch/insn_trans/trans_branch.c.inc | 4 ++--
 target/loongarch/translate.c   | 8 
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/target/loongarch/insn_trans/trans_arith.c.inc 
b/target/loongarch/insn_trans/trans_arith.c.inc
index 4c21d8b037..e3b7979e15 100644
--- a/target/loongarch/insn_trans/trans_arith.c.inc
+++ b/target/loongarch/insn_trans/trans_arith.c.inc
@@ -72,7 +72,7 @@ static bool gen_pc(DisasContext *ctx, arg_r_i *a,
target_ulong (*func)(target_ulong, int))
 {
 TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
-target_ulong addr = func(ctx->base.pc_next, a->imm);
+target_ulong addr = make_address_pc(ctx, func(ctx->base.pc_next, a->imm));
 
 tcg_gen_movi_tl(dest, addr);
 gen_set_gpr(a->rd, dest, EXT_NONE);
diff --git a/target/loongarch/insn_trans/trans_branch.c.inc 
b/target/loongarch/insn_trans/trans_branch.c.inc
index b63058235d..cf035e44ff 100644
--- a/target/loongarch/insn_trans/trans_branch.c.inc
+++ b/target/loongarch/insn_trans/trans_branch.c.inc
@@ -12,7 +12,7 @@ static bool trans_b(DisasContext *ctx, arg_b *a)
 
 static bool trans_bl(DisasContext *ctx, arg_bl *a)
 {
-tcg_gen_movi_tl(cpu_gpr[1], ctx->base.pc_next + 4);
+tcg_gen_movi_tl(cpu_gpr[1], make_address_pc(ctx, ctx->base.pc_next + 4));
 gen_goto_tb(ctx, 0, ctx->base.pc_next + a->offs);
 ctx->base.is_jmp = DISAS_NORETURN;
 return true;
@@ -25,7 +25,7 @@ static bool trans_jirl(DisasContext *ctx, arg_jirl *a)
 
 TCGv addr = make_address_i(ctx, src1, a->imm);
 tcg_gen_mov_tl(cpu_pc, addr);
-tcg_gen_movi_tl(dest, ctx->base.pc_next + 4);
+tcg_gen_movi_tl(dest, make_address_pc(ctx, ctx->base.pc_next + 4));
 gen_set_gpr(a->rd, dest, EXT_NONE);
 tcg_gen_lookup_and_goto_ptr();
 ctx->base.is_jmp = DISAS_NORETURN;
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 689da19ed0..de7c1c5d1f 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -236,6 +236,14 @@ static TCGv make_address_i(DisasContext *ctx, TCGv base, 
target_long ofs)
 return make_address_x(ctx, base, addend);
 }
 
+static uint64_t make_address_pc(DisasContext *ctx, uint64_t addr)
+{
+if (ctx->va32) {
+addr = (int32_t)addr;
+}
+return addr;
+}
+
 #include "decode-insns.c.inc"
 #include "insn_trans/trans_arith.c.inc"
 #include "insn_trans/trans_shift.c.inc"
-- 
2.41.0

[PATCH v5 04/11] target/loongarch: Support LoongArch32 TLB entry

2023-08-09 Thread Jiajie Chen

The TLB entry of LA32 lacks NR, NX and RPLV and they are hardwired to
zero in LoongArch32.

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 target/loongarch/cpu-csr.h|  9 +
 target/loongarch/tlb_helper.c | 17 -
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/target/loongarch/cpu-csr.h b/target/loongarch/cpu-csr.h
index f8f24032cb..48ed2e0632 100644
--- a/target/loongarch/cpu-csr.h
+++ b/target/loongarch/cpu-csr.h
@@ -66,10 +66,11 @@ FIELD(TLBENTRY, D, 1, 1)
 FIELD(TLBENTRY, PLV, 2, 2)
 FIELD(TLBENTRY, MAT, 4, 2)
 FIELD(TLBENTRY, G, 6, 1)
-FIELD(TLBENTRY, PPN, 12, 36)
-FIELD(TLBENTRY, NR, 61, 1)
-FIELD(TLBENTRY, NX, 62, 1)
-FIELD(TLBENTRY, RPLV, 63, 1)
+FIELD(TLBENTRY_32, PPN, 8, 24)
+FIELD(TLBENTRY_64, PPN, 12, 36)
+FIELD(TLBENTRY_64, NR, 61, 1)
+FIELD(TLBENTRY_64, NX, 62, 1)
+FIELD(TLBENTRY_64, RPLV, 63, 1)
 
 #define LOONGARCH_CSR_ASID   0x18 /* Address space identifier */
 FIELD(CSR_ASID, ASID, 0, 10)
diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
index 6e00190547..cef10e2257 100644
--- a/target/loongarch/tlb_helper.c
+++ b/target/loongarch/tlb_helper.c
@@ -48,10 +48,17 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, 
hwaddr *physical,
 tlb_v = FIELD_EX64(tlb_entry, TLBENTRY, V);
 tlb_d = FIELD_EX64(tlb_entry, TLBENTRY, D);
 tlb_plv = FIELD_EX64(tlb_entry, TLBENTRY, PLV);
-tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY, PPN);
-tlb_nx = FIELD_EX64(tlb_entry, TLBENTRY, NX);
-tlb_nr = FIELD_EX64(tlb_entry, TLBENTRY, NR);
-tlb_rplv = FIELD_EX64(tlb_entry, TLBENTRY, RPLV);
+if (is_la64(env)) {
+tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY_64, PPN);
+tlb_nx = FIELD_EX64(tlb_entry, TLBENTRY_64, NX);
+tlb_nr = FIELD_EX64(tlb_entry, TLBENTRY_64, NR);
+tlb_rplv = FIELD_EX64(tlb_entry, TLBENTRY_64, RPLV);
+} else {
+tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY_32, PPN);
+tlb_nx = 0;
+tlb_nr = 0;
+tlb_rplv = 0;
+}
 
 /* Check access rights */
 if (!tlb_v) {
@@ -79,7 +86,7 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, 
hwaddr *physical,
  * tlb_entry contains ppn[47:12] while 16KiB ppn is [47:15]
  * need adjust.
  */
-*physical = (tlb_ppn << R_TLBENTRY_PPN_SHIFT) |
+*physical = (tlb_ppn << R_TLBENTRY_64_PPN_SHIFT) |
 (address & MAKE_64BIT_MASK(0, tlb_ps));
 *prot = PAGE_READ;
 if (tlb_d) {
-- 
2.41.0

[PATCH v5 07/11] target/loongarch: Add LA64 & VA32 to DisasContext

2023-08-09 Thread Jiajie Chen

Add LA64 and VA32(32-bit Virtual Address) to DisasContext to allow the
translator to reject doubleword instructions in LA32 mode for example.

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 target/loongarch/cpu.h   | 13 +
 target/loongarch/translate.c |  3 +++
 target/loongarch/translate.h |  2 ++
 3 files changed, 18 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 2af4c414b0..0e02257f91 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -431,6 +431,17 @@ static inline bool is_la64(CPULoongArchState *env)
 return FIELD_EX32(env->cpucfg[1], CPUCFG1, ARCH) == CPUCFG1_ARCH_LA64;
 }
 
+static inline bool is_va32(CPULoongArchState *env)
+{
+/* VA32 if !LA64 or VA32L[1-3] */
+bool va32 = !is_la64(env);
+uint64_t plv = FIELD_EX64(env->CSR_CRMD, CSR_CRMD, PLV);
+if (plv >= 1 && (FIELD_EX64(env->CSR_MISC, CSR_MISC, VA32) & (1 << plv))) {
+va32 = true;
+}
+return va32;
+}
+
 /*
  * LoongArch CPUs hardware flags.
  */
@@ -438,6 +449,7 @@ static inline bool is_la64(CPULoongArchState *env)
 #define HW_FLAGS_CRMD_PGR_CSR_CRMD_PG_MASK   /* 0x10 */
 #define HW_FLAGS_EUEN_FPE   0x04
 #define HW_FLAGS_EUEN_SXE   0x08
+#define HW_FLAGS_VA32   0x20
 
 static inline void cpu_get_tb_cpu_state(CPULoongArchState *env, vaddr *pc,
 uint64_t *cs_base, uint32_t *flags)
@@ -447,6 +459,7 @@ static inline void cpu_get_tb_cpu_state(CPULoongArchState 
*env, vaddr *pc,
 *flags = env->CSR_CRMD & (R_CSR_CRMD_PLV_MASK | R_CSR_CRMD_PG_MASK);
 *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, FPE) * HW_FLAGS_EUEN_FPE;
 *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, SXE) * HW_FLAGS_EUEN_SXE;
+*flags |= is_va32(env) * HW_FLAGS_VA32;
 }
 
 void loongarch_cpu_list(void);
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 3146a2d4ac..ac847745df 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -119,6 +119,9 @@ static void 
loongarch_tr_init_disas_context(DisasContextBase *dcbase,
 ctx->vl = LSX_LEN;
 }
 
+ctx->la64 = is_la64(env);
+ctx->va32 = (ctx->base.tb->flags & HW_FLAGS_VA32) != 0;
+
 ctx->zero = tcg_constant_tl(0);
 }
 
diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index 7f60090580..b6fa5df82d 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -33,6 +33,8 @@ typedef struct DisasContext {
 uint16_t plv;
 int vl;   /* Vector length */
 TCGv zero;
+bool la64; /* LoongArch64 mode */
+bool va32; /* 32-bit virtual address */
 } DisasContext;
 
 void generate_exception(DisasContext *ctx, int excp);
-- 
2.41.0

[PATCH v5 11/11] target/loongarch: Add loongarch32 cpu la132

2023-08-09 Thread Jiajie Chen

Add la132 as a loongarch32 cpu type and allow virt machine to be used
with la132 instead of la464.

Due to lack of public documentation of la132, it is currently a
synthetic loongarch32 cpu model. Details need to be added in the future.

Signed-off-by: Jiajie Chen 
---
 hw/loongarch/virt.c|  5 -
 target/loongarch/cpu.c | 29 +
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index e19b042ce8..af15bf5aaa 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -798,11 +798,6 @@ static void loongarch_init(MachineState *machine)
 cpu_model = LOONGARCH_CPU_TYPE_NAME("la464");
 }
 
-if (!strstr(cpu_model, "la464")) {
-error_report("LoongArch/TCG needs cpu type la464");
-exit(1);
-}
-
 if (ram_size < 1 * GiB) {
 error_report("ram_size must be greater than 1G.");
 exit(1);
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index bd980790f2..dd1cd7d7d2 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -439,6 +439,34 @@ static void loongarch_la464_initfn(Object *obj)
 env->CSR_ASID = FIELD_DP64(0, CSR_ASID, ASIDBITS, 0xa);
 }
 
+static void loongarch_la132_initfn(Object *obj)
+{
+LoongArchCPU *cpu = LOONGARCH_CPU(obj);
+CPULoongArchState *env = >env;
+
+int i;
+
+for (i = 0; i < 21; i++) {
+env->cpucfg[i] = 0x0;
+}
+
+cpu->dtb_compatible = "loongarch,Loongson-1C103";
+
+uint32_t data = 0;
+data = FIELD_DP32(data, CPUCFG1, ARCH, 1); /* LA32 */
+data = FIELD_DP32(data, CPUCFG1, PGMMU, 1);
+data = FIELD_DP32(data, CPUCFG1, IOCSR, 1);
+data = FIELD_DP32(data, CPUCFG1, PALEN, 0x1f); /* 32 bits */
+data = FIELD_DP32(data, CPUCFG1, VALEN, 0x1f); /* 32 bits */
+data = FIELD_DP32(data, CPUCFG1, UAL, 1);
+data = FIELD_DP32(data, CPUCFG1, RI, 0);
+data = FIELD_DP32(data, CPUCFG1, EP, 0);
+data = FIELD_DP32(data, CPUCFG1, RPLV, 0);
+data = FIELD_DP32(data, CPUCFG1, HP, 1);
+data = FIELD_DP32(data, CPUCFG1, IOCSR_BRD, 1);
+env->cpucfg[1] = data;
+}
+
 static void loongarch_cpu_list_entry(gpointer data, gpointer user_data)
 {
 const char *typename = object_class_get_name(OBJECT_CLASS(data));
@@ -778,6 +806,7 @@ static const TypeInfo loongarch_cpu_type_infos[] = {
 .class_init = loongarch32_cpu_class_init,
 },
 DEFINE_LOONGARCH_CPU_TYPE("la464", loongarch_la464_initfn),
+DEFINE_LOONGARCH32_CPU_TYPE("la132", loongarch_la132_initfn),
 };
 
 DEFINE_TYPES(loongarch_cpu_type_infos)
-- 
2.41.0

[PATCH v5 06/11] target/loongarch: Support LoongArch32 VPPN

2023-08-09 Thread Jiajie Chen

VPPN of TLBEHI/TLBREHI is limited to 19 bits in LA32.

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 target/loongarch/cpu-csr.h|  6 --
 target/loongarch/tlb_helper.c | 23 ++-
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/target/loongarch/cpu-csr.h b/target/loongarch/cpu-csr.h
index b93f99a9ef..c59d7a9fcb 100644
--- a/target/loongarch/cpu-csr.h
+++ b/target/loongarch/cpu-csr.h
@@ -57,7 +57,8 @@ FIELD(CSR_TLBIDX, PS, 24, 6)
 FIELD(CSR_TLBIDX, NE, 31, 1)
 
 #define LOONGARCH_CSR_TLBEHI 0x11 /* TLB EntryHi */
-FIELD(CSR_TLBEHI, VPPN, 13, 35)
+FIELD(CSR_TLBEHI_32, VPPN, 13, 19)
+FIELD(CSR_TLBEHI_64, VPPN, 13, 35)
 
 #define LOONGARCH_CSR_TLBELO00x12 /* TLB EntryLo0 */
 #define LOONGARCH_CSR_TLBELO10x13 /* TLB EntryLo1 */
@@ -164,7 +165,8 @@ FIELD(CSR_TLBRERA, PC, 2, 62)
 #define LOONGARCH_CSR_TLBRELO1   0x8d /* TLB refill entrylo1 */
 #define LOONGARCH_CSR_TLBREHI0x8e /* TLB refill entryhi */
 FIELD(CSR_TLBREHI, PS, 0, 6)
-FIELD(CSR_TLBREHI, VPPN, 13, 35)
+FIELD(CSR_TLBREHI_32, VPPN, 13, 19)
+FIELD(CSR_TLBREHI_64, VPPN, 13, 35)
 #define LOONGARCH_CSR_TLBRPRMD   0x8f /* TLB refill mode info */
 FIELD(CSR_TLBRPRMD, PPLV, 0, 2)
 FIELD(CSR_TLBRPRMD, PIE, 2, 1)
diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
index 1f8e7911c7..c8b8b0497f 100644
--- a/target/loongarch/tlb_helper.c
+++ b/target/loongarch/tlb_helper.c
@@ -300,8 +300,13 @@ static void raise_mmu_exception(CPULoongArchState *env, 
target_ulong address,
 
 if (tlb_error == TLBRET_NOMATCH) {
 env->CSR_TLBRBADV = address;
-env->CSR_TLBREHI = FIELD_DP64(env->CSR_TLBREHI, CSR_TLBREHI, VPPN,
-  extract64(address, 13, 35));
+if (is_la64(env)) {
+env->CSR_TLBREHI = FIELD_DP64(env->CSR_TLBREHI, CSR_TLBREHI_64,
+VPPN, extract64(address, 13, 35));
+} else {
+env->CSR_TLBREHI = FIELD_DP64(env->CSR_TLBREHI, CSR_TLBREHI_32,
+VPPN, extract64(address, 13, 19));
+}
 } else {
 if (!FIELD_EX64(env->CSR_DBG, CSR_DBG, DST)) {
 env->CSR_BADV = address;
@@ -366,12 +371,20 @@ static void fill_tlb_entry(CPULoongArchState *env, int 
index)
 
 if (FIELD_EX64(env->CSR_TLBRERA, CSR_TLBRERA, ISTLBR)) {
 csr_ps = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI, PS);
-csr_vppn = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI, VPPN);
+if (is_la64(env)) {
+csr_vppn = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI_64, VPPN);
+} else {
+csr_vppn = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI_32, VPPN);
+}
 lo0 = env->CSR_TLBRELO0;
 lo1 = env->CSR_TLBRELO1;
 } else {
 csr_ps = FIELD_EX64(env->CSR_TLBIDX, CSR_TLBIDX, PS);
-csr_vppn = FIELD_EX64(env->CSR_TLBEHI, CSR_TLBEHI, VPPN);
+if (is_la64(env)) {
+csr_vppn = FIELD_EX64(env->CSR_TLBEHI, CSR_TLBEHI_64, VPPN);
+} else {
+csr_vppn = FIELD_EX64(env->CSR_TLBEHI, CSR_TLBEHI_32, VPPN);
+}
 lo0 = env->CSR_TLBELO0;
 lo1 = env->CSR_TLBELO1;
 }
@@ -491,7 +504,7 @@ void helper_tlbfill(CPULoongArchState *env)
 
 if (pagesize == stlb_ps) {
 /* Only write into STLB bits [47:13] */
-address = entryhi & ~MAKE_64BIT_MASK(0, R_CSR_TLBEHI_VPPN_SHIFT);
+address = entryhi & ~MAKE_64BIT_MASK(0, R_CSR_TLBEHI_64_VPPN_SHIFT);
 
 /* Choose one set ramdomly */
 set = get_random_tlb(0, 7);
-- 
2.41.0

[PATCH v5 05/11] target/loongarch: Support LoongArch32 DMW

2023-08-09 Thread Jiajie Chen

LA32 uses a different encoding for CSR.DMW and a new direct mapping
mechanism.

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 target/loongarch/cpu-csr.h|  7 +++
 target/loongarch/tlb_helper.c | 26 +++---
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/target/loongarch/cpu-csr.h b/target/loongarch/cpu-csr.h
index 48ed2e0632..b93f99a9ef 100644
--- a/target/loongarch/cpu-csr.h
+++ b/target/loongarch/cpu-csr.h
@@ -188,10 +188,9 @@ FIELD(CSR_DMW, PLV1, 1, 1)
 FIELD(CSR_DMW, PLV2, 2, 1)
 FIELD(CSR_DMW, PLV3, 3, 1)
 FIELD(CSR_DMW, MAT, 4, 2)
-FIELD(CSR_DMW, VSEG, 60, 4)
-
-#define dmw_va2pa(va) \
-(va & MAKE_64BIT_MASK(0, TARGET_VIRT_ADDR_SPACE_BITS))
+FIELD(CSR_DMW_32, PSEG, 25, 3)
+FIELD(CSR_DMW_32, VSEG, 29, 3)
+FIELD(CSR_DMW_64, VSEG, 60, 4)
 
 /* Debug CSRs */
 #define LOONGARCH_CSR_DBG0x500 /* debug config */
diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
index cef10e2257..1f8e7911c7 100644
--- a/target/loongarch/tlb_helper.c
+++ b/target/loongarch/tlb_helper.c
@@ -173,6 +173,18 @@ static int loongarch_map_address(CPULoongArchState *env, 
hwaddr *physical,
 return TLBRET_NOMATCH;
 }
 
+static hwaddr dmw_va2pa(CPULoongArchState *env, target_ulong va,
+target_ulong dmw)
+{
+if (is_la64(env)) {
+return va & TARGET_VIRT_MASK;
+} else {
+uint32_t pseg = FIELD_EX32(dmw, CSR_DMW_32, PSEG);
+return (va & MAKE_64BIT_MASK(0, R_CSR_DMW_32_VSEG_SHIFT)) | \
+(pseg << R_CSR_DMW_32_VSEG_SHIFT);
+}
+}
+
 static int get_physical_address(CPULoongArchState *env, hwaddr *physical,
 int *prot, target_ulong address,
 MMUAccessType access_type, int mmu_idx)
@@ -192,12 +204,20 @@ static int get_physical_address(CPULoongArchState *env, 
hwaddr *physical,
 }
 
 plv = kernel_mode | (user_mode << R_CSR_DMW_PLV3_SHIFT);
-base_v = address >> R_CSR_DMW_VSEG_SHIFT;
+if (is_la64(env)) {
+base_v = address >> R_CSR_DMW_64_VSEG_SHIFT;
+} else {
+base_v = address >> R_CSR_DMW_32_VSEG_SHIFT;
+}
 /* Check direct map window */
 for (int i = 0; i < 4; i++) {
-base_c = FIELD_EX64(env->CSR_DMW[i], CSR_DMW, VSEG);
+if (is_la64(env)) {
+base_c = FIELD_EX64(env->CSR_DMW[i], CSR_DMW_64, VSEG);
+} else {
+base_c = FIELD_EX64(env->CSR_DMW[i], CSR_DMW_32, VSEG);
+}
 if ((plv & env->CSR_DMW[i]) && (base_c == base_v)) {
-*physical = dmw_va2pa(address);
+*physical = dmw_va2pa(env, address, env->CSR_DMW[i]);
 *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
 return TLBRET_MATCH;
 }
-- 
2.41.0

[PATCH v5 03/11] target/loongarch: Add GDB support for loongarch32 mode

2023-08-09 Thread Jiajie Chen

GPRs and PC are 32-bit wide in loongarch32 mode.

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
---
 configs/targets/loongarch64-softmmu.mak |  2 +-
 gdb-xml/loongarch-base32.xml| 45 +
 target/loongarch/cpu.c  | 10 +-
 target/loongarch/gdbstub.c  | 32 ++
 4 files changed, 80 insertions(+), 9 deletions(-)
 create mode 100644 gdb-xml/loongarch-base32.xml

diff --git a/configs/targets/loongarch64-softmmu.mak 
b/configs/targets/loongarch64-softmmu.mak
index 9abc99056f..f23780fdd8 100644
--- a/configs/targets/loongarch64-softmmu.mak
+++ b/configs/targets/loongarch64-softmmu.mak
@@ -1,5 +1,5 @@
 TARGET_ARCH=loongarch64
 TARGET_BASE_ARCH=loongarch
 TARGET_SUPPORTS_MTTCG=y
-TARGET_XML_FILES= gdb-xml/loongarch-base64.xml gdb-xml/loongarch-fpu.xml
+TARGET_XML_FILES= gdb-xml/loongarch-base32.xml gdb-xml/loongarch-base64.xml 
gdb-xml/loongarch-fpu.xml
 TARGET_NEED_FDT=y
diff --git a/gdb-xml/loongarch-base32.xml b/gdb-xml/loongarch-base32.xml
new file mode 100644
index 00..af47bbd3da
--- /dev/null
+++ b/gdb-xml/loongarch-base32.xml
@@ -0,0 +1,45 @@
+
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index c6b73444b4..30dd70571a 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -694,7 +694,13 @@ static const struct SysemuCPUOps loongarch_sysemu_ops = {
 
 static gchar *loongarch_gdb_arch_name(CPUState *cs)
 {
-return g_strdup("loongarch64");
+LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+CPULoongArchState *env = >env;
+if (is_la64(env)) {
+return g_strdup("loongarch64");
+} else {
+return g_strdup("loongarch32");
+}
 }
 
 static void loongarch_cpu_class_init(ObjectClass *c, void *data)
@@ -734,6 +740,8 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
 
 static void loongarch32_cpu_class_init(ObjectClass *c, void *data)
 {
+CPUClass *cc = CPU_CLASS(c);
+cc->gdb_core_xml_file = "loongarch-base32.xml";
 }
 
 #define DEFINE_LOONGARCH_CPU_TYPE(model, initfn) \
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index 0752fff924..a462e25737 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -34,16 +34,25 @@ int loongarch_cpu_gdb_read_register(CPUState *cs, 
GByteArray *mem_buf, int n)
 {
 LoongArchCPU *cpu = LOONGARCH_CPU(cs);
 CPULoongArchState *env = >env;
+uint64_t val;
 
 if (0 <= n && n < 32) {
-return gdb_get_regl(mem_buf, env->gpr[n]);
+val = env->gpr[n];
 } else if (n == 32) {
 /* orig_a0 */
-return gdb_get_regl(mem_buf, 0);
+val = 0;
 } else if (n == 33) {
-return gdb_get_regl(mem_buf, env->pc);
+val = env->pc;
 } else if (n == 34) {
-return gdb_get_regl(mem_buf, env->CSR_BADV);
+val = env->CSR_BADV;
+}
+
+if (0 <= n && n <= 34) {
+if (is_la64(env)) {
+return gdb_get_reg64(mem_buf, val);
+} else {
+return gdb_get_reg32(mem_buf, val);
+}
 }
 return 0;
 }
@@ -52,15 +61,24 @@ int loongarch_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 {
 LoongArchCPU *cpu = LOONGARCH_CPU(cs);
 CPULoongArchState *env = >env;
-target_ulong tmp = ldtul_p(mem_buf);
+target_ulong tmp;
+int read_length;
 int length = 0;
 
+if (is_la64(env)) {
+tmp = ldq_p(mem_buf);
+read_length = 8;
+} else {
+tmp = ldl_p(mem_buf);
+read_length = 4;
+}
+
 if (0 <= n && n < 32) {
 env->gpr[n] = tmp;
-length = sizeof(target_ulong);
+length = read_length;
 } else if (n == 33) {
 env->pc = tmp;
-length = sizeof(target_ulong);
+length = read_length;
 }
 return length;
 }
-- 
2.41.0

[PATCH v5 02/11] target/loongarch: Add new object class for loongarch32 cpus

2023-08-09 Thread Jiajie Chen

Add object class for future loongarch32 cpus. It is derived from the
loongarch64 object class.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/cpu.c | 19 +++
 target/loongarch/cpu.h |  1 +
 2 files changed, 20 insertions(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index ad93ecac92..c6b73444b4 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -732,12 +732,22 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
 #endif
 }
 
+static void loongarch32_cpu_class_init(ObjectClass *c, void *data)
+{
+}
+
 #define DEFINE_LOONGARCH_CPU_TYPE(model, initfn) \
 { \
 .parent = TYPE_LOONGARCH_CPU, \
 .instance_init = initfn, \
 .name = LOONGARCH_CPU_TYPE_NAME(model), \
 }
+#define DEFINE_LOONGARCH32_CPU_TYPE(model, initfn) \
+{ \
+.parent = TYPE_LOONGARCH32_CPU, \
+.instance_init = initfn, \
+.name = LOONGARCH_CPU_TYPE_NAME(model), \
+}
 
 static const TypeInfo loongarch_cpu_type_infos[] = {
 {
@@ -750,6 +760,15 @@ static const TypeInfo loongarch_cpu_type_infos[] = {
 .class_size = sizeof(LoongArchCPUClass),
 .class_init = loongarch_cpu_class_init,
 },
+{
+.name = TYPE_LOONGARCH32_CPU,
+.parent = TYPE_LOONGARCH_CPU,
+.instance_size = sizeof(LoongArchCPU),
+
+.abstract = true,
+.class_size = sizeof(LoongArchCPUClass),
+.class_init = loongarch32_cpu_class_init,
+},
 DEFINE_LOONGARCH_CPU_TYPE("la464", loongarch_la464_initfn),
 };
 
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 5a71d64a04..2af4c414b0 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -382,6 +382,7 @@ struct ArchCPU {
 };
 
 #define TYPE_LOONGARCH_CPU "loongarch-cpu"
+#define TYPE_LOONGARCH32_CPU "loongarch32-cpu"
 
 OBJECT_DECLARE_CPU_TYPE(LoongArchCPU, LoongArchCPUClass,
 LOONGARCH_CPU)
-- 
2.41.0

[PATCH v5 08/11] target/loongarch: Reject la64-only instructions in la32 mode

2023-08-09 Thread Jiajie Chen

LoongArch64-only instructions are marked with regard to the instruction
manual Table 2. LSX instructions are not marked for now for lack of
public manual.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/insn_trans/trans_arith.c.inc | 30 
 .../loongarch/insn_trans/trans_atomic.c.inc   | 76 +--
 target/loongarch/insn_trans/trans_bit.c.inc   | 28 +++
 .../loongarch/insn_trans/trans_branch.c.inc   |  4 +-
 target/loongarch/insn_trans/trans_extra.c.inc | 16 ++--
 target/loongarch/insn_trans/trans_fmov.c.inc  |  4 +-
 .../loongarch/insn_trans/trans_memory.c.inc   | 68 -
 target/loongarch/insn_trans/trans_shift.c.inc | 14 ++--
 target/loongarch/translate.h  |  7 ++
 9 files changed, 127 insertions(+), 120 deletions(-)

diff --git a/target/loongarch/insn_trans/trans_arith.c.inc 
b/target/loongarch/insn_trans/trans_arith.c.inc
index 43d6cf261d..4c21d8b037 100644
--- a/target/loongarch/insn_trans/trans_arith.c.inc
+++ b/target/loongarch/insn_trans/trans_arith.c.inc
@@ -249,9 +249,9 @@ static bool trans_addu16i_d(DisasContext *ctx, 
arg_addu16i_d *a)
 }
 
 TRANS(add_w, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_add_tl)
-TRANS(add_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_add_tl)
+TRANS_64(add_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_add_tl)
 TRANS(sub_w, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_sub_tl)
-TRANS(sub_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_sub_tl)
+TRANS_64(sub_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_sub_tl)
 TRANS(and, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_and_tl)
 TRANS(or, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_or_tl)
 TRANS(xor, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_xor_tl)
@@ -261,32 +261,32 @@ TRANS(orn, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, 
tcg_gen_orc_tl)
 TRANS(slt, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_slt)
 TRANS(sltu, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_sltu)
 TRANS(mul_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_SIGN, tcg_gen_mul_tl)
-TRANS(mul_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_mul_tl)
+TRANS_64(mul_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_mul_tl)
 TRANS(mulh_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_NONE, gen_mulh_w)
 TRANS(mulh_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_NONE, gen_mulh_w)
-TRANS(mulh_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_d)
-TRANS(mulh_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_du)
-TRANS(mulw_d_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_NONE, tcg_gen_mul_tl)
-TRANS(mulw_d_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_NONE, tcg_gen_mul_tl)
+TRANS_64(mulh_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_d)
+TRANS_64(mulh_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_du)
+TRANS_64(mulw_d_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_NONE, tcg_gen_mul_tl)
+TRANS_64(mulw_d_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_NONE, tcg_gen_mul_tl)
 TRANS(div_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_SIGN, gen_div_w)
 TRANS(mod_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_SIGN, gen_rem_w)
 TRANS(div_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_SIGN, gen_div_du)
 TRANS(mod_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_SIGN, gen_rem_du)
-TRANS(div_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_d)
-TRANS(mod_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_d)
-TRANS(div_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_du)
-TRANS(mod_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_du)
+TRANS_64(div_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_d)
+TRANS_64(mod_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_d)
+TRANS_64(div_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_du)
+TRANS_64(mod_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_du)
 TRANS(slti, gen_rri_v, EXT_NONE, EXT_NONE, gen_slt)
 TRANS(sltui, gen_rri_v, EXT_NONE, EXT_NONE, gen_sltu)
 TRANS(addi_w, gen_rri_c, EXT_NONE, EXT_SIGN, tcg_gen_addi_tl)
-TRANS(addi_d, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_addi_tl)
+TRANS_64(addi_d, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_addi_tl)
 TRANS(alsl_w, gen_rrr_sa, EXT_NONE, EXT_SIGN, gen_alsl)
-TRANS(alsl_wu, gen_rrr_sa, EXT_NONE, EXT_ZERO, gen_alsl)
-TRANS(alsl_d, gen_rrr_sa, EXT_NONE, EXT_NONE, gen_alsl)
+TRANS_64(alsl_wu, gen_rrr_sa, EXT_NONE, EXT_ZERO, gen_alsl)
+TRANS_64(alsl_d, gen_rrr_sa, EXT_NONE, EXT_NONE, gen_alsl)
 TRANS(pcaddi, gen_pc, gen_pcaddi)
 TRANS(pcalau12i, gen_pc, gen_pcalau12i)
 TRANS(pcaddu12i, gen_pc, gen_pcaddu12i)
-TRANS(pcaddu18i, gen_pc, gen_pcaddu18i)
+TRANS_64(pcaddu18i, gen_pc, gen_pcaddu18i)
 TRANS(andi, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_andi_tl)
 TRANS(ori, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_ori_tl)
 TRANS(xori, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_xori_tl)
diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc 
b/target/loongarch/insn_trans/trans_atomic.c.inc
index 612709f2a7..c69f31bc78 100644
--- a/target/loongarch/insn_trans/trans_atomic.c.inc
+++ b/target/loongarch/insn_trans/trans_atomic.c.inc
@@ -70,41 +70,41 @@ static bool gen_am(DisasContext *ctx, arg_rrr *a,
 
 TRANS(ll_w, gen_ll, MO_TESL)
 TRANS

[PATCH v5 00/11] Add la32 & va32 support for loongarch64-softmmu

2023-08-09 Thread Jiajie Chen

This patch series allow qemu-system-loongarch64 to emulate a LoongArch32
machine. A new CPU model (la132) is added for loongarch32, however due
to lack of public documentation, details will need to be added in the
future. Initial GDB support is added.

At the same time, VA32(32-bit virtual address) support is introduced for
LoongArch64.

LA32 support is tested using a small supervisor program at
https://github.com/jiegec/supervisor-la32. VA32 mode under LA64 is not
tested yet.

Changes since v4:

- Code refactor, thanks Richard Henderson for great advice
- Truncate higher 32 bits of PC in VA32 mode
- Revert la132 initfn refactor

Changes since v3:

- Support VA32 mode for LoongArch64
- Check the current arch from CPUCFG.ARCH
- Reject la64-only instructions in la32 mode

Changes since v2:

- Fix typo in previous commit
- Fix VPPN width in TLBEHI/TLBREHI

Changes since v1:

- No longer create a separate qemu-system-loongarch32 executable, but
  allow user to run loongarch32 emulation using qemu-system-loongarch64
- Add loongarch32 cpu support for virt machine

Full changes:

Jiajie Chen (11):
  target/loongarch: Add function to check current arch
  target/loongarch: Add new object class for loongarch32 cpus
  target/loongarch: Add GDB support for loongarch32 mode
  target/loongarch: Support LoongArch32 TLB entry
  target/loongarch: Support LoongArch32 DMW
  target/loongarch: Support LoongArch32 VPPN
  target/loongarch: Add LA64 & VA32 to DisasContext
  target/loongarch: Reject la64-only instructions in la32 mode
  target/loongarch: Truncate high 32 bits of address in VA32 mode
  target/loongarch: Sign extend results in VA32 mode
  target/loongarch: Add loongarch32 cpu la132

 configs/targets/loongarch64-softmmu.mak   |   2 +-
 gdb-xml/loongarch-base32.xml  |  45 
 hw/loongarch/virt.c   |   5 -
 target/loongarch/cpu-csr.h|  22 ++--
 target/loongarch/cpu.c|  74 +++--
 target/loongarch/cpu.h|  33 ++
 target/loongarch/gdbstub.c|  34 --
 target/loongarch/insn_trans/trans_arith.c.inc |  32 +++---
 .../loongarch/insn_trans/trans_atomic.c.inc   |  81 +++---
 target/loongarch/insn_trans/trans_bit.c.inc   |  28 ++---
 .../loongarch/insn_trans/trans_branch.c.inc   |  11 +-
 target/loongarch/insn_trans/trans_extra.c.inc |  16 +--
 .../loongarch/insn_trans/trans_fmemory.c.inc  |  30 ++
 target/loongarch/insn_trans/trans_fmov.c.inc  |   4 +-
 target/loongarch/insn_trans/trans_lsx.c.inc   |  38 ++-
 .../loongarch/insn_trans/trans_memory.c.inc   | 102 --
 target/loongarch/insn_trans/trans_shift.c.inc |  14 +--
 target/loongarch/op_helper.c  |   4 +-
 target/loongarch/tlb_helper.c |  66 +---
 target/loongarch/translate.c  |  43 
 target/loongarch/translate.h  |   9 ++
 21 files changed, 445 insertions(+), 248 deletions(-)
 create mode 100644 gdb-xml/loongarch-base32.xml

-- 
2.41.0

[PATCH v5 01/11] target/loongarch: Add function to check current arch

2023-08-09 Thread Jiajie Chen

Add is_la64 function to check if the current cpucfg[1].arch equals to
2(LA64).

Signed-off-by: Jiajie Chen 
Co-authored-by: Richard Henderson 
Reviewed-by: Richard Henderson 
---
 target/loongarch/cpu.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index fa371ca8ba..5a71d64a04 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -132,6 +132,11 @@ FIELD(CPUCFG1, HP, 24, 1)
 FIELD(CPUCFG1, IOCSR_BRD, 25, 1)
 FIELD(CPUCFG1, MSG_INT, 26, 1)
 
+/* cpucfg[1].arch */
+#define CPUCFG1_ARCH_LA32R   0
+#define CPUCFG1_ARCH_LA321
+#define CPUCFG1_ARCH_LA642
+
 /* cpucfg[2] bits */
 FIELD(CPUCFG2, FP, 0, 1)
 FIELD(CPUCFG2, FP_SP, 1, 1)
@@ -420,6 +425,11 @@ static inline int cpu_mmu_index(CPULoongArchState *env, 
bool ifetch)
 #endif
 }
 
+static inline bool is_la64(CPULoongArchState *env)
+{
+return FIELD_EX32(env->cpucfg[1], CPUCFG1, ARCH) == CPUCFG1_ARCH_LA64;
+}
+
 /*
  * LoongArch CPUs hardware flags.
  */
-- 
2.41.0

Re: [PATCH v4 11/11] target/loongarch: Add loongarch32 cpu la132

2023-08-09 Thread Jiajie Chen




On 2023/8/9 03:26, Richard Henderson wrote:

On 8/7/23 18:54, Jiajie Chen wrote:

+static void loongarch_la464_initfn(Object *obj)
+{
+    LoongArchCPU *cpu = LOONGARCH_CPU(obj);
+    CPULoongArchState *env = >env;
+
+    loongarch_cpu_initfn_common(env);
+
+    cpu->dtb_compatible = "loongarch,Loongson-3A5000";
+    env->cpucfg[0] = 0x14c010;  /* PRID */
+
+    uint32_t data = env->cpucfg[1];
+    data = FIELD_DP32(data, CPUCFG1, ARCH, 2); /* LA64 */
+    data = FIELD_DP32(data, CPUCFG1, PALEN, 0x2f); /* 48 bits */
+    data = FIELD_DP32(data, CPUCFG1, VALEN, 0x2f); /* 48 bits */
+    data = FIELD_DP32(data, CPUCFG1, RI, 1);
+    data = FIELD_DP32(data, CPUCFG1, EP, 1);
+    data = FIELD_DP32(data, CPUCFG1, RPLV, 1);
+    env->cpucfg[1] = data;
+}
+
+static void loongarch_la132_initfn(Object *obj)
+{
+    LoongArchCPU *cpu = LOONGARCH_CPU(obj);
+    CPULoongArchState *env = >env;
+
+    loongarch_cpu_initfn_common(env);
+
+    cpu->dtb_compatible = "loongarch,Loongson-1C103";
+
+    uint32_t data = env->cpucfg[1];
+    data = FIELD_DP32(data, CPUCFG1, ARCH, 1); /* LA32 */
+    data = FIELD_DP32(data, CPUCFG1, PALEN, 0x1f); /* 32 bits */
+    data = FIELD_DP32(data, CPUCFG1, VALEN, 0x1f); /* 32 bits */
+    data = FIELD_DP32(data, CPUCFG1, RI, 0);
+    data = FIELD_DP32(data, CPUCFG1, EP, 0);
+    data = FIELD_DP32(data, CPUCFG1, RPLV, 0);
+    env->cpucfg[1] = data;
+}


The use of the loongarch_cpu_initfn_common function is not going to 
scale.

Compare the set of *_initfn in target/arm/tcg/cpu32.c

In general, you want to copy data in bulk from the processor manual, 
so that the reviewer can simply read through the table and see that 
the code is correct, without having to check between multiple 
functions to see that the combination is correct.


For our existing la464, that table is Table 54 in the 3A5000 manual.

Is there a public specification for the la132?  I could not find one 
in https://www.loongson.cn/EN/product/, but perhaps that's just the 
english view.



There seems no, even from the chinese view.





r~

Re: [PATCH v4 01/11] target/loongarch: Add macro to check current arch

2023-08-08 Thread Jiajie Chen




On 2023/8/9 01:01, Richard Henderson wrote:

On 8/7/23 18:54, Jiajie Chen wrote:

Add macro to check if the current cpucfg[1].arch equals to 1(LA32) or
2(LA64).

Signed-off-by: Jiajie Chen 
---
  target/loongarch/cpu.h | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index fa371ca8ba..bf0da8d5b4 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -132,6 +132,13 @@ FIELD(CPUCFG1, HP, 24, 1)
  FIELD(CPUCFG1, IOCSR_BRD, 25, 1)
  FIELD(CPUCFG1, MSG_INT, 26, 1)
  +/* cpucfg[1].arch */
+#define CPUCFG1_ARCH_LA32    1
+#define CPUCFG1_ARCH_LA64    2
+
+#define LOONGARCH_CPUCFG_ARCH(env, mode) \
+  (FIELD_EX32(env->cpucfg[1], CPUCFG1, ARCH) == CPUCFG1_ARCH_##mode)


Reviewed-by: Richard Henderson 

But in using this recall that 0 is a defined value for "simplified 
la32", so


   !LOONGARCH_CPUCFG_ARCH(env, LA64)

may not in future equal

   LOONGARCH_CPUCFG_ARCH(env, LA32)

it someone ever decides to implement this simplified version. (We 
emulate very small embedded Arm cpus, so it's not out of the question 
that you may want to emulate the very smallest LoongArch cpus.)



Yes, actually the LoongArch 32 Reduced (or "simplified la32") version is 
my final aim because we are making embedded LoongArch32 Reduced CPUs on 
FPGA for a competition, and supporting LoongArch 32 is the first step ahead.





It might be easier to just define

static inline bool is_la64(CPULoongArch64 *env)
{
    return FIELD_EX32(env->cpucfg[1], CPUCFG1, ARCH) == 
CPUCFG1_ARCH_LA64;

}



Sure, I will use this way.





r~

Re: [PATCH] target/loongarch: Split fcc register to fcc0-7 in gdbstub

2023-08-08 Thread Jiajie Chen




On 2023/8/8 17:55, Jiajie Chen wrote:


On 2023/8/8 14:10, bibo mao wrote:

I am not familiar with gdb, is there  abi breakage?
I do not know how gdb client works with gdb server with different 
versions.
There seemed no versioning in the process, but rather in-code xml 
validation. In gdb, the code only allows new xml (fcc0-7) and rejects 
old one (fcc), so gdb breaks qemu first and do not consider backward 
compatibility with qemu.


Not abi breakage, but gdb will complain:

warning: while parsing target description (at line 1): Target 
description specified unknown architecture "loongarch64"

warning: Could not load XML target description; ignoring
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
Truncated register 38 in remote 'g' packet


Sorry, to be clear, the actual error message is:

(gdb) target extended-remote localhost:1234
Remote debugging using localhost:1234
warning: Architecture rejected target-supplied description
warning: No executable has been specified and target does not support

It rejects the target description xml given by qemu, thus using the 
builtin one. However, there is a mismatch in fcc registers, so it will 
not work if we list floating point registers.


At the same time, if we are using loongarch32 target(I recently posted 
patches to support this), it will reject the target description and 
fallback to loongarch64, making gcc not usable.




And gdb can no longer debug kernel running in qemu. You can reproduce 
this error using latest qemu(without this patch) and gdb(13.1 or later).




Regards
Bibo Mao


在 2023/8/8 13:42, Jiajie Chen 写道:

Since GDB 13.1(GDB commit ea3352172), GDB LoongArch changed to use
fcc0-7 instead of fcc register. This commit partially reverts commit
2f149c759 (`target/loongarch: Update gdb_set_fpu() and gdb_get_fpu()`)
to match the behavior of GDB.

Note that it is a breaking change for GDB 13.0 or earlier, but it is
also required for GDB 13.1 or later to work.

Signed-off-by: Jiajie Chen 
---
  gdb-xml/loongarch-fpu.xml  |  9 -
  target/loongarch/gdbstub.c | 16 +++-
  2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/gdb-xml/loongarch-fpu.xml b/gdb-xml/loongarch-fpu.xml
index 78e42cf5dd..e81e3382e7 100644
--- a/gdb-xml/loongarch-fpu.xml
+++ b/gdb-xml/loongarch-fpu.xml
@@ -45,6 +45,13 @@
    
    
    
-  
+  
+  
+  
+  
+  
+  
+  
+  
    
  
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index 0752fff924..15ad6778f1 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -70,10 +70,9 @@ static int 
loongarch_gdb_get_fpu(CPULoongArchState *env,

  {
  if (0 <= n && n < 32) {
  return gdb_get_reg64(mem_buf, env->fpr[n].vreg.D(0));
-    } else if (n == 32) {
-    uint64_t val = read_fcc(env);
-    return gdb_get_reg64(mem_buf, val);
-    } else if (n == 33) {
+    } else if (32 <= n && n < 40) {
+    return gdb_get_reg8(mem_buf, env->cf[n - 32]);
+    } else if (n == 40) {
  return gdb_get_reg32(mem_buf, env->fcsr0);
  }
  return 0;
@@ -87,11 +86,10 @@ static int 
loongarch_gdb_set_fpu(CPULoongArchState *env,

  if (0 <= n && n < 32) {
  env->fpr[n].vreg.D(0) = ldq_p(mem_buf);
  length = 8;
-    } else if (n == 32) {
-    uint64_t val = ldq_p(mem_buf);
-    write_fcc(env, val);
-    length = 8;
-    } else if (n == 33) {
+    } else if (32 <= n && n < 40) {
+    env->cf[n - 32] = ldub_p(mem_buf);
+    length = 1;
+    } else if (n == 40) {
  env->fcsr0 = ldl_p(mem_buf);
  length = 4;
  }

1 2 >

1 - 100 of 140 matches

Mail list logo