Hi, Matt

I tried to run pagerank in the develop branch 
(5d0a7b6a6cca0dc20e8b8c366db2ccc150c7480a, Thu Nov 3 16:42:53 2022). But I met 
a new error (details are below). 


The error message:
```
/HIP/rocclr/hip_global.cpp:69: guarantee(false && "Cannot find Symbol")
build/GCN3_X86/sim/faults.cc:60: panic: panic condition !FullSystem occurred: 
fault (General-Protection) detected @ PC 
(0x7ffff6afa941=>0x7ffff6afa942).(0=>1)
Memory Usage: 19719528 KBytes
Program aborted at tick 1904281529500

```



It seems the hip version is not correct. I wonder if this problem is because my 
docker image version is old. (I used gcn-gpu v22-0). 


The good news is that pagerank runs more instructions and prints more output 
(although it did not run successfully to the end). I am not sure whether it's 
random. But for now, I think it's good news.


Finally, I'm relatively new to gem5 debugging. Could you give some 
tips about debugging the trace? For example, the debug flag (should I use the 
--debug-flags=ProtocolTrace or another accurate flag about GPU?). 



Thanks.


------------------ ???????? ------------------
??????:                                                                         
                                               "The gem5 Users mailing list"    
                                                                                
<gem5-users@gem5.org&gt;;
????????:&nbsp;2022??11??6??(??????) ????2:15
??????:&nbsp;"The gem5 Users mailing list"<gem5-users@gem5.org&gt;;
????:&nbsp;"1575883782"<1575883...@qq.com&gt;;"Matt 
Sinclair"<sincl...@cs.wisc.edu&gt;;
????:&nbsp;[gem5-users] Re: Gem5 GCN3 (GPUCoalescer detected deadlock when 
running pagerank.)



 
 
 Matt
 
 Sent from my iPhone
 
 On Nov 5, 2022, at 10:51 PM, 1575883782 via gem5-users 
<gem5-users@gem5.org&gt; wrote:
 
  
  ?1?3  Thanks. I will try to use `--reg-alloc-policy=dynamic`(I didn't specify 
a specific policy, I just used the default policy). And I will further read the 
trace.
 Then, I am using the stable branch. The commit is:
 
  ```
  commit 39f85b7a3be1ee0ff6e375c9791dd62d23eb8a3e (HEAD -&gt; stable, tag: 
v22.0.0.1, origin/stable, origin/master, origin/HEAD)
 Author: Bobby R. Bruce <bbr...@ucdavis.edu&gt;
 Date:&nbsp; &nbsp;Sat Jun 18 04:59:02 2022 -0700
 
 
 &nbsp; &nbsp; misc: Update version info to v22.0.0.1
 
 ```
 
 
 ------------------ Original ------------------
  From: "The gem5 Users mailing list" <gem5-users@gem5.org&gt;;
 Date:&nbsp;Sun, Nov 6, 2022 02:55 AM
 To:&nbsp;"The gem5 Users mailing list"<gem5-users@gem5.org&gt;;
 Cc:&nbsp;"1575883782"<1575883...@qq.com&gt;;"Matt 
Sinclair"<sincl...@cs.wisc.edu&gt;;
 Subject:&nbsp;[gem5-users] Re: Gem5 GCN3 (GPUCoalescer detected deadlock when 
running pagerank.)
 
 
 
  
Hi,
 
&nbsp;
 
Ultimately this message is telling you there is a deadlock in the cache 
coherence protocol when running PageRank with the specifications you did.&nbsp; 
To fix it, you would need to get a trace 
(https://www.gem5.org/documentation/learning_gem5/part3/MSIdebugging/)  and 
look through to see what the problem is.&nbsp; If you do this and find a fix, 
we definitely welcome any patches you may find to help with this!
 
&nbsp;
 
Having said that, I??ve been trying to replicate your problem.&nbsp; However, 
the input size you are running means that gem5 will be running for a while, so 
it will take a while before I can say something more definitive.&nbsp; We do 
test PageRank  as part of the weekly tests, but not specifically for 16 
CUs.&nbsp; What branch (stable vs. develop) are you using?&nbsp; Also, I 
recommend using --reg-alloc-policy=dynamic, as this is a more realistic 
register allocation policy than the simple one (which I can??t tell  if you are 
using or not).&nbsp; In the meantime, if you can answer the above questions, 
that may help us debug.
 
&nbsp;
 
Thanks,
 
Matt
 
&nbsp;
  
From: 1575883782 via gem5-users <gem5-users@gem5.org&gt; 
 Sent: Saturday, November 5, 2022 3:58 AM
 To: gem5-users <gem5-users@gem5.org&gt;
 Cc: 1575883782 <1575883...@qq.com&gt;
 Subject: [gem5-users] Gem5 GCN3 (GPUCoalescer detected deadlock when running 
pagerank.)
 
 
&nbsp;
 Hi, &nbsp; I was trying to run PageRank benchmark with its GCN3 GPU model. I 
succeed running PageRank with 4 CUs, but when I run it with 16CUs, I met some 
problems. The key error message is 
"build/GCN3_X86/mem/ruby/system/GPUCoalescer.cc:292: warn: GPUCoalescer 10 
Possible deadlock detected!" Was I missing something? I don't know how to solve 
it. Someone could help me? 4CUs command line (default CU number is 4) ``` 
command line: build/GCN3_X86/gem5.opt -n 3 --mem-size=8GB 
--benchmark-root=/home/ubuntu/lmy/gem5-gcn3/gem5-resources/src/gpu/pannotia -c 
pagerank/bin/pagerank_spmv 
'--options=/home/ubuntu/lmy/gem5-gcn3/gem5-resources/src/gpu/pannotia/pagerank/coAuthorsDBLP.graph
 1' ``` &nbsp; 16CUs command line ``` command line: build/GCN3_X86/gem5.opt 
configs/example/apu_se.py -n 3 --num-compute-units 16 --mem-size=8GB 
--benchmark-root=/home/ubuntu/lmy/gem5-gcn3/gem5-resources/src/gpu/pannotia -c 
pagerank/bin/pagerank_spmv 
'--options=/home/ubuntu/lmy/gem5-resources/src/gpu/pannotia/pagerank/coAuthorsDBLP.graph
 1' ``` gem5 version ``` gem5 version 22.0.0.1 gem5 compiled Jun 29 2022 
10:34:02 gem5 started Nov&nbsp; 3 2022 14:32:39 gem5 executing on 1bcbbec61aaf, 
pid 1287240 ``` Error message: ``` 
build/GCN3_X86/mem/ruby/system/GPUCoalescer.cc:292: warn: GPUCoalescer 10 
Possible deadlock detected! Printing out 763 outstanding requests in the 
coalesced table &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Addr: [0x3b8b1c0, line 
0x3b8b1c0] &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Instruction sequence number: 16871 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Type: LD &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 
&nbsp; Number of associated packets: 2 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Issue time: 1732620214000 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Difference from current tick: 280298000 Addr: [0x3b8b300, line 0x3b8b300] 
&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Instruction sequence number: 16871 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Type: LD &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 
&nbsp; Number of associated packets: 3 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Issue time: 1732620214000 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Difference from current tick: 280298000 Addr: [0x3b8b380, line 0x3b8b380] 
&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Instruction sequence number: 16871 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Type: LD &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 
&nbsp; Number of associated packets: 1 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Issue time: 1732620214000 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Difference from current tick: 280298000 Addr: [0x3b8b3c0, line 0x3b8b3c0] 
&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Instruction sequence number: 16871 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Type: LD &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 
&nbsp; Number of associated packets: 3 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Issue time: 1732620214000 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Difference from current tick: 280298000 Addr: [0x3b8b440, line 0x3b8b440] 
&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Instruction sequence number: 16871 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Type: LD &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 
&nbsp; Number of associated packets: 1 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Issue time: 1732620214000 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Difference from current tick: 280298000 Addr: [0x3b8b480, line 0x3b8b480] 
&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Instruction sequence number: 16871 
&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Type: LD 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Number of associated packets: 2 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Issue time: 1732620214000 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Difference from current tick: 280298000 Addr: [0x3b8b4c0, line 0x3b8b4c0] 
&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Instruction sequence number: 16871 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Type: LD &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 
&nbsp; Number of associated packets: 1 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Issue time: 1732620214000 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Difference from current tick: 280298000 Addr: [0x3b8b540, line 0x3b8b540] 
&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Instruction sequence number: 16871 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Type: LD &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 
&nbsp; Number of associated packets: 1 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Issue time: 1732620214000 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Difference from current tick: 280298000 Addr: [0x3b8b5c0, line 0x3b8b5c0] 
&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Instruction sequence number: 16871 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Type: LD &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 
&nbsp; Number of associated packets: 2 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Issue time: 1732620214000 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Difference from current tick: 280298000 Addr: [0x3b8b680, line 0x3b8b680] 
&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Instruction sequence number: 16871 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Type: LD &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 
&nbsp; Number of associated packets: 1 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Issue time: 1732620214000 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Difference from current tick: 280298000 Addr: [0x3b8b740, line 0x3b8b740] 
&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Instruction sequence number: 16871 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Type: LD &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 
&nbsp; Number of associated packets: 3 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Issue time: 1732620214000 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Difference from current tick: 280298000 Addr: [0x3b8b7c0, line 0x3b8b7c0] 
&nbsp; &nbsp; &nbsp; ................................... &nbsp; 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Difference from current tick: 17915000 Addr: [0x4c60b40, line 0x4c60b40] 
&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; Instruction sequence number: 16552 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Type: LD &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 
&nbsp; Number of associated packets: 1 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Issue time: 1732882652000 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp; 
Difference from current tick: 17860000Listing pending packets from 0 
instructions build/GCN3_X86/mem/ruby/system/GPUCoalescer.cc:294: panic: 
Aborting due to deadlock! Memory Usage: 19939216 KBytes Program aborted at tick 
1732900512000 --- BEGIN LIBC BACKTRACE --- 
/home/ubuntu/lmy/gem5-gcn3/gem5/build/GCN3_X86/gem5.opt(+0x4fb330)[0x55f2ea122330]
 
/home/ubuntu/lmy/gem5-gcn3/gem5/build/GCN3_X86/gem5.opt(+0x5297ee)[0x55f2ea1507ee]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x7fe799cb63c0] 
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fe798e5e03b] 
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fe798e3d859] 
/home/ubuntu/lmy/gem5-gcn3/gem5/build/GCN3_X86/gem5.opt(+0x512b15)[0x55f2ea139b15]
 
/home/ubuntu/lmy/gem5-gcn3/gem5/build/GCN3_X86/gem5.opt(+0xffa194)[0x55f2eac21194]
 
/home/ubuntu/lmy/gem5-gcn3/gem5/build/GCN3_X86/gem5.opt(+0x515ed2)[0x55f2ea13ced2]
 
/home/ubuntu/lmy/gem5-gcn3/gem5/build/GCN3_X86/gem5.opt(+0x553944)[0x55f2ea17a944]
 
/home/ubuntu/lmy/gem5-gcn3/gem5/build/GCN3_X86/gem5.opt(+0x55469e)[0x55f2ea17b69e]
 
/home/ubuntu/lmy/gem5-gcn3/gem5/build/GCN3_X86/gem5.opt(+0x1c5b422)[0x55f2eb882422]
 
/home/ubuntu/lmy/gem5-gcn3/gem5/build/GCN3_X86/gem5.opt(+0x4a3e27)[0x55f2ea0cae27]
 /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8738)[0x7fe799f6f738] 
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7fe799d44f48]
 
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7fe799e91e3b]
 
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7fe799f6f114]
 /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7fe799d3bd6d] 
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86)[0x7fe799d43ef6]
 
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7fe799e91e3b]
 
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x42)[0x7fe799e921c2]
 
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyEval_EvalCode+0x1f)[0x7fe799e925af] 
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x1cfbf1)[0x7fe799e96bf1] 
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x25f537)[0x7fe799f26537] 
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)[0x7fe799d3bd6d] 
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x12fd)[0x7fe799d3d46d]
 /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)[0x7fe799d4706b] 
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyVectorcall_Call+0x60)[0x7fe799f6f830]
 
/home/ubuntu/lmy/gem5-gcn3/gem5/build/GCN3_X86/gem5.opt(+0x52b704)[0x55f2ea152704]
 
/home/ubuntu/lmy/gem5-gcn3/gem5/build/GCN3_X86/gem5.opt(+0x423666)[0x55f2ea04a666]
 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fe798e3f0b3] 
/home/ubuntu/lmy/gem5-gcn3/gem5/build/GCN3_X86/gem5.opt(+0x492f0e)[0x55f2ea0b9f0e]
 --- END LIBC BACKTRACE --- ``` &nbsp; &nbsp; 
 
 _______________________________________________
 gem5-users mailing list -- gem5-users@gem5.org
 To unsubscribe send an email to gem5-users-le...@gem5.org
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

Reply via email to