Thanks for the reply. No I do not use checkpoints. I am aware of the checkpoint 
problem (found that out the hard way XD). I run full system from start to end 
and running one of the parsec benchmarks each time with small size input (I do 
have multiple machines running in parallel).
George

From: [email protected]
Date: Mon, 15 Aug 2016 16:07:39 -0700
To: [email protected]
Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock

Hi,
Are you taking checkpoints? If yes then getting a deadlock is normal
On Mon, Aug 15, 2016 at 2:28 PM, George Mappouras <[email protected]> 
wrote:



Hi Jason,
Thanks for the suggestions. I use MESI_Two_Level and I also compliled gem5 for 
that protocol like this:scons RUBY=TRUE PROTOCOL=MESI_Two_Level 
build/X86/gem5.opt -j8
"The system you're simulating is quite a stress test for the Ruby protocol 
you're using! "Why are you saying that? Could you give me some inside of why 
MESI could make my system slower comparing to other protocols? What would you 
suggest me to use?
George
From: [email protected]
Date: Mon, 15 Aug 2016 17:01:47 +0000
To: [email protected]
Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock

Hi George,
The system you're simulating is quite a stress test for the Ruby protocol 
you're using! What protocol have you compiled?
The problem you're running into could be very simple. It's possible that due to 
the high bandwidth of the system, some of the queues in Ruby are filling up and 
causing the average memory access latency to skyrocket due to queuing delays. 
If this happens, the protocol could be "correct" but still cause a deadlock 
detection. In this case, you may be able to increase the deadlock threshold and 
see the application start to work again. We often see this with GPU workloads.
However, it's more likely a bug somewhere in the protocol you're using. To 
debug this, you'll need to dig into the protocol. The debug flag 
"ProtocolTrace" is useful here. With this debug flag you'll see every 
transition in Ruby. With this information you should be able to trace back and 
find the memory operation that's causing the deadlock. I would also suggest 
using "--debug-start=<tick>" and pick the highest tick value you can before the 
offending operation (e.g., a little less than 5676227351000). Otherwise the 
trace may be 10s-100s of GB (and take days to generate).
Hopefully this helps you get on the right track. Good luck!
Jason
On Wed, Aug 10, 2016 at 6:50 PM George Mappouras <[email protected]> 
wrote:



Hi all,
I had some trouble while running Parsec benchmarks with gem5 + Ruby (using MESI 
two level protocol). I found out that some of the benchmarks will cause gem5 to 
crush because a deadlock was detected. The configuration I use is the follow:
I have 8 nodes connected to a ring. Each node is a core connected with a 
private 64KB L1 cache and one channel of High Bandwidth Memory (HBM). Also each 
core has one out of 8 banks of the shared 8MB L2 cache connected to them. The 
command I run looks like this:
 ./build/X86/gem5.opt
configs/example/fs.py --disk-image=x86root-parsec.img
--kernel=x86_64-vmlinux-2.6.22.9.smp -n 8 --cpu-type=detailed
--cpu-clock=1GHz --caches --l1d_size=64kB --num-l2caches=8 --l2_size=8MB
--mem-type=HBM_1000_4H_x128 --mem-channels=8 --mem-size=2GB --ruby --num-dirs=8
--topology=Torus --mesh-rows=1 --access-backing-store 
--script=a_parsec_script.sh
I use the latest version of gem5 and I have no problem booting or running 
commands on the simulated machine. However as i mentioned above some benchmarks 
cause gem5 to crush with a message like this:
panic: Possible Deadlock detected. Aborting!version: 1 request.paddr: 
0x53d6a000 m_writeRequestTable: 1 current time: 5677053833000 issue_time: 
5676227351000 difference: 826482000 @ tick 
5677053833000[wakeup:build/X86/mem/ruby/system/Sequencer.cc, line 119]Memory 
Usage: 5799824 KBytesProgram aborted at tick 5677053833000--- BEGIN LIBC 
BACKTRACE 
---./build/X86/gem5.opt(_Z15print_backtracev+0x15)[0x9f8275]./build/X86/gem5.opt(_Z12abortHandleri+0x36)[0xa09536]/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f3dd5cdf330]/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f3dd4530c37]/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f3dd4534028]./build/X86/gem5.opt(_Z15__exit_epilogueiPKcS0_iS0_+0x1ec)[0x9bba6c]./build/X86/gem5.opt(_Z14__exit_messageIIjmmmmmEEvPKciS1_S1_iS1_DpRKT_+0xc7)[0x93ca27]./build/X86/gem5.opt(_ZN9Sequencer6wakeupEv+0x266)[0x93a936]./build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xb1)[0xa018e1]./build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x38)[0xa22938]./build/X86/gem5.opt(_Z8simulatem+0x1fb)[0xa22ebb]./build/X86/gem5.opt[0x969d7c]/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x45f7)[0x7f3dd58f7af7]/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x563e)[0x7f3dd58f8b3e]/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyRun_StringFlags+0x79)[0x7f3dd58f34b9]./build/X86/gem5.opt(_Z6m5MainiPPc+0x5f)[0xa08caf]./build/X86/gem5.opt(main+0x33)[0x701933]/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f3dd451bf45]./build/X86/gem5.opt[0x725e83]---
 END LIBC BACKTRACE ---
Anyone can help me figure out what the problem is? Am I missing something? Does 
my system configuration match the command I run? I would appreciate any help!
Thanks,George
                                                                                
  
_______________________________________________

gem5-users mailing list

[email protected]

http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
[email protected]
https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=QGmz4aNiITDmIyB9HAQE4AQ1CxN0rKJ_HDhUQHDNzLE&s=_F9MBBBx2h6jFmhkaoZSmtybPEujq80mthkaRrMLR8o&e=
                                         

_______________________________________________

gem5-users mailing list

[email protected]

http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



_______________________________________________
gem5-users mailing list
[email protected]
https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=SgrYiG5YTc70hfhS5naZL8PFSgLUaqAw_yp_z3xGjCA&s=y-ooPUGBSWdig5Q7Ze3poXJWMhGMrPCZeL-TkzByVog&e=
                                         
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to