Thank you. I will try to switch to starter_se.py.
I still had some questions regarding SVE.
1. When I compile with msve-vector-bit set to 512, I can see PTRUE instruction, 
which is replaced by whilelow when I compile without setting the vector bit 
value. Now on gem5, it seems whilelow and the corresponding incw instructions 
works fine, because when I keep sve_vl=1 in gem5, incw increments by 0x4 ( 128 
bits) and when I set sve_vl=4 the incw increments by 0x16 (512 bits). But what 
I am curious about, is whether there is anything wrong with the implementation 
of PTRUE instruction in gem5.
2. As shown in my first email, my data arrays are 64 bytes in size. An sve load 
instruction with sve_vl=4 will allow all 64 bytes to be loaded by one ld1w 
instruction (theoretically at least in an actual cpu ). I can see from the 
outputs generated by debug flag LSQUnit and CacheALL, that indeed all 64 bytes 
are accessed by one instruction. For example:
system.cpu.dcache: access for WriteReq [81010:8104f]
The address range here are for 64 byte (16 integer of 4 byte in my test code).
But, without support in the bus/interconnection connected with cpu to deal with 
64 bytes (or whatever is the vector length)  and additional code in gem5 to 
support multi-word read/write , shouldnt only one word (I am guessing that is 4 
byte in gem5 for arm) can be read from cache to cpu ? In that case, how are all 
64 bytes is requested and read from cache to cpu in gem5 with one instruction? 
Is there some underlying mechanism, like micro-ops or some architectural 
feature that is taking place transparently ? Or maybe a simple loop that is not 
part of the debug flag output? I tried to look in src/mem/cache/base.cc and 
cache.cc but could not get an answer.
________________________________
From: Giacomo Travaglini <giacomo.travagl...@arm.com>
Sent: 12 January 2024 03:56
To: Nazmus Sakib <nsak...@nmsu.edu>; The gem5 Users mailing list 
<gem5-users@gem5.org>
Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu>
Subject: Re: ARM SVE ISA

You don't often get email from giacomo.travagl...@arm.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
WARNING This email originated external to the NMSU email system. Do not click 
on links or open attachments unless you are sure the content is safe.

You are right, I created a PR to fix this:



https://github.com/gem5/gem5/pull/764



Kind Regards



Giacomo



From: Nazmus Sakib <nsak...@nmsu.edu>
Date: Thursday, 11 January 2024 at 19:34
To: Giacomo Travaglini <giacomo.travagl...@arm.com>, The gem5 Users mailing 
list <gem5-users@gem5.org>
Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu>
Subject: Re: ARM SVE ISA

Not compiling with -msve-vector-bits did the trick. It runs perfectly, whether 
I set the cpu[0].isa[0].sve_vl_se to 4 or keep it to 1.
Thank you for the suggestions !!
One last thing, the starter_se.py does not seem to have support for 
--cpu-type=ArmO3CPU (or am I missing something) ?

________________________________

From: Giacomo Travaglini <giacomo.travagl...@arm.com>
Sent: 11 January 2024 12:16
To: The gem5 Users mailing list <gem5-users@gem5.org>
Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu>; Nazmus Sakib <nsak...@nmsu.edu>
Subject: Re: ARM SVE ISA



You don't often get email from giacomo.travagl...@arm.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>

WARNING This email originated external to the NMSU email system. Do not click 
on links or open attachments unless you are sure the content is safe.

Hi Nazmus,



I can see from what you posted you are compiling the testcase with 512b vector 
width. I believe you should amend the gem5 VL accordingly… Basically writing up 
in the gem5 config:



cpu.isa[0].sve_vl_se = 4



According to [1].

This should fix your problem. Another solution I believe would be to compile 
without specifying the VL. Then it should be VL agnostic code I presume.



Anyway, I also recommend you use configs/example/arm/starter_se.py as se.py is 
per se deprecated



Kind Regards



Giacomo



[1]: https://github.com/gem5/gem5/blob/stable/src/arch/arm/ArmISA.py#L179



From: Nazmus Sakib via gem5-users <gem5-users@gem5.org>
Date: Thursday, 11 January 2024 at 17:54
To: gem5-users@gem5.org <gem5-users@gem5.org>
Cc: Jason Lowe-Power <jlowepo...@ucdavis.edu>, Nazmus Sakib <nsak...@nmsu.edu>
Subject: [gem5-users] ARM SVE ISA

Hello.
I am trying to run a simple program with SVE instructions on gem5. However, the 
output with debug flag ExecALL suggests there is a issue with the decoder.
Here is the test code:

#define STREAM_ARRAY_SIZE 16
void main()

{

for (int j=0; j<STREAM_ARRAY_SIZE; j++)

       {

      A[j]=3; B[j]=2;

       }

int x=add(A,B);

printf("return %d \n",A[3]);  // should print 6, does not in gem5

}



int add(int * restrict p, int * restrict q)

{  

for (int i=0; i<STREAM_ARRAY_SIZE; i+=1)

      {

        *(p+i)=*(q+i)+4;

               }

printf("dummy %d %d \n",  *(p+3),  *(q+3));    // should print 6 and 2, does 
not in gem5

return *(p+3);

}
I compiled it with gcc cross compiler for arm with following command:

aarch64-linux-gnu-gcc-11 -O3 -static  -mcpu=a64fx+sve2 -msve-vector-bits=512 -o 
test test.c

Without the-mcpu=a64fx+sve2, SVE instructions are not generated.
Here is the command I used:
./build/ARM/gem5.opt ./configs/deprecated/example/se.py --cpu-type=ArmO3CPU 
--caches --cacheline_size=64 --mem-size=8GB --arm-iset=aarch64 -c ./test
I have also used "./configs/example/arm/starter_se.py", but the results are 
same.
When I use --debug-flag=Execall, I see the following isssues:
1) 12589500: system.cpu: A0 T0 : 0x400524 @main+4    :   ptrue   p0, VL64       
  : SimdPredAlu
:  D=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]  FetchSeq=14292  CPSeq=4962  flags=()

The D=[] should not be all zeros.

2)

12591000: system.cpu: A0 T0 : 0x400550 @main+48    :   st1   {z1}, p0/z, , 
[x19] : MemWrite :
 A=0x491040  FetchSeq=14305  CPSeq=4975  flags=(IsInteger|IsVector|IsStore)

12591000: system.cpu: A0 T0 : 0x400554 @main+52    :   st1   {z0}, p0/z, , 
[x19, #1, mul vl] : MemWrite : A=0x491050  FetchSeq=14306  CPSeq=4976  
flags=(IsInteger|IsVector|IsStore)

The second A should be 0x491080, not 0x491050.

I have run the same thing on RIKEN simulator, which was built on top of gem5 
for Fujitsu A64FX.
Here are the same instructions seen in RIKEN.
1) 15322000: system.cpu A0 T0 : @main+4    :   ptrue   p0, VL64         : 
SimdPredAlu :  
D=0b[0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111]
  FetchSeq=18146  CPSeq=5254  flags=()
As you can see, my data arrays are 64 bytes and appropriate bits in predicate 
registers are set to 1.
2)
15323000: system.cpu A0 T0 : @main+48    :   st1   {z1}, p0/z, , [x19] : 
SveMemWrite :
 A=0x491040  FetchSeq=18159  CPSeq=5267  
flags=(IsInteger|IsVector|IsMemRef|IsStore)

15323000: system.cpu A0 T0 : @main+52    :   st1   {z0}, p0/z, , [x19, #1, mul 
vl] : SveMemWrite :

  A=0x491080  FetchSeq=18160  CPSeq=5268

The second address is calcuated as 0x491080, which is the correct result for 
x19, #1, mul vl, as vl=64.

I tried to compare the files in src/arch/arm/ISA from riken with current gem5. 
Since RIKEN is based on old gem5, there are obvious syntax differences. Other 
than that, I have found 2 things:
1) in ArmISA.py, in riken, there is this:

     id_aa64pfr0_el1 = Param.UInt64(0x0000000100000022, "AArch64 Processor 
Feature Register 0")"

I did not find anything similar in gem5. I did find id_aa64pfr0_el1 in 
ar/arm/reg/misch.hh but its value wasnt set anwhere.

2) In ArmISA.py in current gem5, there is this "FEAT_SVE" extension in class 
ArmDefaultSERelease. However, this is for armv8.2, and I dont know how to 
specify this architecture in command line.

What I am trying to find out is, am I missing any runtime flags that would 
enable the proper SVE instructions in gem5, or is it due to any compile time 
flags since I am setting -mcpu to a64fx (setting -march to armv8.2-a+sve or 
whatever does not produce SVE instructions, it has to be -mcpu=a64fx+sve), or 
is it a possible issue/bug in the new gem5 itself. Any suggestions would be 
appreciated.
Thank you.

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

Reply via email to