Hi all,
Thanks for your followups and suggestions. The reason I use Ruby is because I 
intend to change the way blocks are distributed to different HBM channels, 
potentially causing more message traffic than normal (redundant blocks will be 
generated and saved in HBM channels). So I want to be able to measure the 
increase in message passing in my network and how that would effect my 
performance.
Of course the problem I run in to (and initially reported in the previous 
emails) arises without my modifications. I wanted to make sure first that the 
baseline (unmodified gem5 version) is working correctly. 
I will try to see how timing cpu performs just as a sanity check, play with the 
deadlock detection threshold and test other protocols as well. 
Thanks again,
George
From: [email protected]
To: [email protected]
Date: Tue, 16 Aug 2016 07:25:46 +0000
Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock











Hi all,






Remember that the timing CPU should _not_ be used for any performance-relatated 
experiments. Stick to the in-order and out-of-order CPUs for any such use-cases.






In general I would also expect less issues with the classic memory system 
(especially with full system), and it does a fine job at modelling 
crossbar-based many-core systems. At the moment it does not support X86 out of 
the box, but it may still be worth

 considering if you’re having issues with Ruby.






Andreas










From: gem5-users <[email protected]> on behalf of Ruohuang Zheng 
<[email protected]>


Reply-To: gem5 users mailing list <[email protected]>


Date: Tuesday, 16 August 2016 at 02:46


To: gem5 users mailing list <[email protected]>


Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock













Hi,






You can try running the benchmarks in timing CPU instead of detailed to see if 
it works. As far as I know, there are various bugs in Ruby and using detailed 
CPU makes the bugs more likely to be exposed.






On Mon, Aug 15, 2016 at 4:37 PM, George Mappouras 

<[email protected]> wrote:






Thanks for the reply. No I do not use checkpoints. I am aware of the checkpoint 
problem (found that out the hard way XD). I run full system from start to end 
and running one of the parsec benchmarks each time with small size input (I do 
have

 multiple machines running in parallel).






George









From: [email protected]


Date: Mon, 15 Aug 2016 16:07:39 -0700






To: [email protected]


Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock





Hi,






Are you taking checkpoints? If yes then getting a deadlock is normal






On Mon, Aug 15, 2016 at 2:28 PM, George Mappouras <[email protected]> 
wrote:






Hi Jason,






Thanks for the suggestions. I use MESI_Two_Level and I also compliled gem5 for 
that protocol like this:

scons RUBY=TRUE PROTOCOL=MESI_Two_Level build/X86/gem5.opt -j8








"The system you're simulating is quite a stress test for the Ruby protocol 
you're using! "



Why are you saying that? Could you give me some inside of why MESI could make 
my system slower comparing to other protocols? What would you suggest me to use?






George










From: [email protected]


Date: Mon, 15 Aug 2016 17:01:47 +0000


To: [email protected]


Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock









Hi George,






The system you're simulating is quite a stress test for the Ruby protocol 
you're using! What protocol have you compiled?






The problem you're running into could be very simple. It's possible that due to 
the high bandwidth of the system, some of the queues in Ruby are filling up and 
causing the average memory access latency to skyrocket due to queuing delays. 
If this happens,

 the protocol could be "correct" but still cause a deadlock detection. In this 
case, you may be able to increase the deadlock threshold and see the 
application start to work again. We often see this with GPU workloads.






However, it's more likely a bug somewhere in the protocol you're using. To 
debug this, you'll need to dig into the protocol. The debug flag 
"ProtocolTrace" is useful here. With this debug flag you'll see every 
transition in Ruby. With this information

 you should be able to trace back and find the memory operation that's causing 
the deadlock. I would also suggest using "--debug-start=<tick>" and pick the 
highest tick value you can before the offending operation (e.g., a little less 
than 5676227351000). Otherwise

 the trace may be 10s-100s of GB (and take days to generate).






Hopefully this helps you get on the right track. Good luck!






Jason








On Wed, Aug 10, 2016 at 6:50 PM George Mappouras <[email protected]> 
wrote:














Hi all,






I had some trouble while running Parsec benchmarks with gem5 + Ruby (using MESI 
two level protocol). I found out that some of the benchmarks will cause gem5 to 
crush because a deadlock was detected. The configuration I use is the follow:






I have 8 nodes connected to a ring. Each node is a core connected with a 
private 64KB L1 cache and one channel of High Bandwidth Memory (HBM). Also each 
core has one out of 8 banks of the shared 8MB L2 cache connected to them. The 
command I run looks like

 this:






 ./build/X86/gem5.opt configs/example/fs.py --disk-image=x86root-parsec.img

 --kernel=x86_64-vmlinux-2.6.22.9.smp -n 8 --cpu-type=detailed --cpu-clock=1GHz 
--caches --l1d_size=64kB --num-l2caches=8 --l2_size=8MB 
--mem-type=HBM_1000_4H_x128 --mem-channels=8 --mem-size=2GB --ruby --num-dirs=8 
--topology=Torus --mesh-rows=1 --access-backing-store

 --script=a_parsec_script.sh






I use the latest version of gem5 and I have no problem booting or

 running commands on the simulated machine. However as i mentioned above some 
benchmarks cause gem5 to crush with a message like this:








panic: Possible Deadlock detected. Aborting!

version: 1 request.paddr: 0x53d6a000 m_writeRequestTable: 1 current time: 
5677053833000 issue_time: 5676227351000 difference: 826482000

 @ tick 5677053833000

[wakeup:build/X86/mem/ruby/system/Sequencer.cc, line 119]

Memory Usage: 5799824 KBytes

Program aborted at tick 5677053833000

--- BEGIN LIBC BACKTRACE ---

./build/X86/gem5.opt(_Z15print_backtracev+0x15)[0x9f8275]

./build/X86/gem5.opt(_Z12abortHandleri+0x36)[0xa09536]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f3dd5cdf330]

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f3dd4530c37]

/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f3dd4534028]

./build/X86/gem5.opt(_Z15__exit_epilogueiPKcS0_iS0_+0x1ec)[0x9bba6c]

./build/X86/gem5.opt(_Z14__exit_messageIIjmmmmmEEvPKciS1_S1_iS1_DpRKT_+0xc7)[0x93ca27]

./build/X86/gem5.opt(_ZN9Sequencer6wakeupEv+0x266)[0x93a936]

./build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xb1)[0xa018e1]

./build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x38)[0xa22938]

./build/X86/gem5.opt(_Z8simulatem+0x1fb)[0xa22ebb]

./build/X86/gem5.opt[0x969d7c]

/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x45f7)[0x7f3dd58f7af7]

/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]

/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]

/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]

/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]

/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]

/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]

/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x563e)[0x7f3dd58f8b3e]

/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]

/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]

/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]

/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]

/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyRun_StringFlags+0x79)[0x7f3dd58f34b9]

./build/X86/gem5.opt(_Z6m5MainiPPc+0x5f)[0xa08caf]

./build/X86/gem5.opt(main+0x33)[0x701933]

/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f3dd451bf45]

./build/X86/gem5.opt[0x725e83]

--- END LIBC BACKTRACE ---








Anyone can help me figure out what the problem is? Am I missing something? Does 
my system configuration match the command I run? I would appreciate any

 help!






Thanks,

George













_______________________________________________


gem5-users mailing list


[email protected]


http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users










_______________________________________________ gem5-users mailing list 

[email protected] 

https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=QGmz4aNiITDmIyB9HAQE4AQ1CxN0rKJ_HDhUQHDNzLE&s=_F9MBBBx2h6jFmhkaoZSmtybPEujq80mthkaRrMLR8o&e=










_______________________________________________


gem5-users mailing list


[email protected]


http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users


















_______________________________________________ gem5-users mailing list 

[email protected] 

https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=SgrYiG5YTc70hfhS5naZL8PFSgLUaqAw_yp_z3xGjCA&s=y-ooPUGBSWdig5Q7Ze3poXJWMhGMrPCZeL-TkzByVog&e=










_______________________________________________


gem5-users mailing list


[email protected]


http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users















IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for

 any purpose, or store or copy the information in any medium. Thank you.






_______________________________________________
gem5-users mailing list
[email protected]
https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=2lh2uH4gQ06tL34ePpjZY1MeLC0oo_0XA6ttiGf9nJ4&s=M4dy_Bb3TAMKamdzi2G1kELB3xkkOGJmtqgdHhrWOMI&e=
                                         
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to