Hello,

 I am currently studying the mechanism of O3 CPU. I have enabled the O3CPUAll, 
BTB, and Branch debugging flags. 

```
Disassembly of section .text:

0000000000010000 <.text>:
   10000:       f3 0f 1e fa             endbr64 
   10004:       55                      push   rbp
   10005:       48 89 e5                mov    rbp,rsp
   10008:       c7 45 fc 00 00 00 00    mov    DWORD PTR [rbp-0x4],0x0
   1000f:       8b 05 8b 00 00 00       mov    eax,DWORD PTR [rip+0x8b]        
# 0x100a0
   10015:       89 45 fc                mov    DWORD PTR [rbp-0x4],eax
   10018:       8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]
   1001b:       0f b6 c0                movzx  eax,al
   1001e:       48 98                   cdqe   
   10020:       48 8d 14 85 00 00 00    lea    rdx,[rax*4+0x0]
   10027:       00 
   10028:       48 8d 05 71 00 02 00    lea    rax,[rip+0x20071]        # 
0x300a0
   1002f:       c7 04 02 01 00 00 00    mov    DWORD PTR [rdx+rax*1],0x1
   10036:       90                      nop
   10037:       5d                      pop    rbp
   10038:       c3                      ret    
   10039:       0f 1f 80 00 00 00 00    nop    DWORD PTR [rax+0x0]
   10040:       bc 00 00 01 00          mov    esp,0x10000
   10045:       e8 b6 ff ff ff          call   0x10000
   1004a:       eb fe                   jmp    0x1004a

Disassembly of section .eh_frame:

0000000000010050 <.eh_frame>:
   10050:       14 00                   adc    al,0x0
   10052:       00 00                   add    BYTE PTR [rax],al
   10054:       00 00                   add    BYTE PTR [rax],al
   10056:       00 00                   add    BYTE PTR [rax],al
   10058:       01 7a 52                add    DWORD PTR [rdx+0x52],edi
   1005b:       00 01                   add    BYTE PTR [rcx],al
   1005d:       78 10                   js     0x1006f
```

In my code as above, with the program entry point at 0x10040, the instructions 
at 0x1004a should be repeatedly executed, without executing the instructions 
after it. Transient execution cases are not ruled out, but the expectation is 
that only a few instructions after 0x1004a will be executed once.

However, the puzzling issue is that, I have been running the simulation for a 
long time, but the prediction for the instructions at the infinite loop 
location is only 7 times. On the other hand, the conditional instructions after 
0x1004a have been predicted many times, and if the simulation continues 
indefinitely, I expect this count to be infinite. 

I'm certain that in my configuration, I'm using the default cache size, which 
is more than enough to accommodate these instructions. However, I noticed that 
after the "ret" instruction at 0x10038, there is still an ICache miss. 

I'm using the configuration "x86-ubuntu-run.py" from gem5_library, with only 
the workload modified. I've been debugging the gem5 program for several days, 
but I haven't been able to locate the error. I'm wondering if there's anything 
I need to configure to make it run as expected. Thank you very much for your 
valuable suggestions.
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org

Reply via email to