Hi Jason,
That's is a fair point. I have couple things to try now.
Thanks a lot for the advice and help!George
From: [email protected]
Date: Tue, 16 Aug 2016 13:27:30 +0000
To: [email protected]
Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock

Hi George,
I was only saying that your system with 8 HBM controllers is much more 
bandwidth than the original developers of Ruby imagined. Therefore, I wouldn't 
be surprised if you are encountering some bugs that others have never seen.
MESI_two_level is one of the more tested protocols, so I would say you're using 
a reasonable protocol. You may want to try a different topology (say 
point-to-point) to test to see if that's causing (or at least has a correlation 
to) the issue.
Jason
On Tue, Aug 16, 2016 at 2:26 AM Andreas Hansson <[email protected]> wrote:





Hi all,



Remember that the timing CPU should _not_ be used for any performance-relatated 
experiments. Stick to the in-order and out-of-order CPUs for any such use-cases.



In general I would also expect less issues with the classic memory system 
(especially with full system), and it does a fine job at modelling 
crossbar-based many-core systems. At the moment it does not support X86 out of 
the box, but it may still be worth
 considering if you’re having issues with Ruby.



Andreas





From: gem5-users <[email protected]> on behalf of Ruohuang Zheng 
<[email protected]>

Reply-To: gem5 users mailing list <[email protected]>

Date: Tuesday, 16 August 2016 at 02:46

To: gem5 users mailing list <[email protected]>

Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock







Hi,



You can try running the benchmarks in timing CPU instead of detailed to see if 
it works. As far as I know, there are various bugs in Ruby and using detailed 
CPU makes the bugs more likely to be exposed.



On Mon, Aug 15, 2016 at 4:37 PM, George Mappouras 
<[email protected]> wrote:



Thanks for the reply. No I do not use checkpoints. I am aware of the checkpoint 
problem (found that out the hard way XD). I run full system from start to end 
and running one of the parsec benchmarks each time with small size input (I do 
have
 multiple machines running in parallel).



George





From: [email protected]

Date: Mon, 15 Aug 2016 16:07:39 -0700



To: [email protected]

Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock



Hi,



Are you taking checkpoints? If yes then getting a deadlock is normal



On Mon, Aug 15, 2016 at 2:28 PM, George Mappouras <[email protected]> 
wrote:



Hi Jason,



Thanks for the suggestions. I use MESI_Two_Level and I also compliled gem5 for 
that protocol like this:
scons RUBY=TRUE PROTOCOL=MESI_Two_Level build/X86/gem5.opt -j8




"The system you're simulating is quite a stress test for the Ruby protocol 
you're using! "

Why are you saying that? Could you give me some inside of why MESI could make 
my system slower comparing to other protocols? What would you suggest me to use?



George





From: [email protected]

Date: Mon, 15 Aug 2016 17:01:47 +0000

To: [email protected]

Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock





Hi George,



The system you're simulating is quite a stress test for the Ruby protocol 
you're using! What protocol have you compiled?



The problem you're running into could be very simple. It's possible that due to 
the high bandwidth of the system, some of the queues in Ruby are filling up and 
causing the average memory access latency to skyrocket due to queuing delays. 
If this happens,
 the protocol could be "correct" but still cause a deadlock detection. In this 
case, you may be able to increase the deadlock threshold and see the 
application start to work again. We often see this with GPU workloads.



However, it's more likely a bug somewhere in the protocol you're using. To 
debug this, you'll need to dig into the protocol. The debug flag 
"ProtocolTrace" is useful here. With this debug flag you'll see every 
transition in Ruby. With this information
 you should be able to trace back and find the memory operation that's causing 
the deadlock. I would also suggest using "--debug-start=<tick>" and pick the 
highest tick value you can before the offending operation (e.g., a little less 
than 5676227351000). Otherwise
 the trace may be 10s-100s of GB (and take days to generate).



Hopefully this helps you get on the right track. Good luck!



Jason




On Wed, Aug 10, 2016 at 6:50 PM George Mappouras <[email protected]> 
wrote:







Hi all,



I had some trouble while running Parsec benchmarks with gem5 + Ruby (using MESI 
two level protocol). I found out that some of the benchmarks will cause gem5 to 
crush because a deadlock was detected. The configuration I use is the follow:



I have 8 nodes connected to a ring. Each node is a core connected with a 
private 64KB L1 cache and one channel of High Bandwidth Memory (HBM). Also each 
core has one out of 8 banks of the shared 8MB L2 cache connected to them. The 
command I run looks like
 this:



 ./build/X86/gem5.opt configs/example/fs.py --disk-image=x86root-parsec.img
 --kernel=x86_64-vmlinux-2.6.22.9.smp -n 8 --cpu-type=detailed --cpu-clock=1GHz 
--caches --l1d_size=64kB --num-l2caches=8 --l2_size=8MB 
--mem-type=HBM_1000_4H_x128 --mem-channels=8 --mem-size=2GB --ruby --num-dirs=8 
--topology=Torus --mesh-rows=1 --access-backing-store
 --script=a_parsec_script.sh



I use the latest version of gem5 and I have no problem booting or
 running commands on the simulated machine. However as i mentioned above some 
benchmarks cause gem5 to crush with a message like this:




panic: Possible Deadlock detected. Aborting!
version: 1 request.paddr: 0x53d6a000 m_writeRequestTable: 1 current time: 
5677053833000 issue_time: 5676227351000 difference: 826482000
 @ tick 5677053833000
[wakeup:build/X86/mem/ruby/system/Sequencer.cc, line 119]
Memory Usage: 5799824 KBytes
Program aborted at tick 5677053833000
--- BEGIN LIBC BACKTRACE ---
./build/X86/gem5.opt(_Z15print_backtracev+0x15)[0x9f8275]
./build/X86/gem5.opt(_Z12abortHandleri+0x36)[0xa09536]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f3dd5cdf330]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f3dd4530c37]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f3dd4534028]
./build/X86/gem5.opt(_Z15__exit_epilogueiPKcS0_iS0_+0x1ec)[0x9bba6c]
./build/X86/gem5.opt(_Z14__exit_messageIIjmmmmmEEvPKciS1_S1_iS1_DpRKT_+0xc7)[0x93ca27]
./build/X86/gem5.opt(_ZN9Sequencer6wakeupEv+0x266)[0x93a936]
./build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xb1)[0xa018e1]
./build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x38)[0xa22938]
./build/X86/gem5.opt(_Z8simulatem+0x1fb)[0xa22ebb]
./build/X86/gem5.opt[0x969d7c]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x45f7)[0x7f3dd58f7af7]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x563e)[0x7f3dd58f8b3e]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]
/usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyRun_StringFlags+0x79)[0x7f3dd58f34b9]
./build/X86/gem5.opt(_Z6m5MainiPPc+0x5f)[0xa08caf]
./build/X86/gem5.opt(main+0x33)[0x701933]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f3dd451bf45]
./build/X86/gem5.opt[0x725e83]
--- END LIBC BACKTRACE ---




Anyone can help me figure out what the problem is? Am I missing something? Does 
my system configuration match the command I run? I would appreciate any
 help!



Thanks,
George






_______________________________________________

gem5-users mailing list

[email protected]

http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users





_______________________________________________ gem5-users mailing list 
[email protected] 
https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=QGmz4aNiITDmIyB9HAQE4AQ1CxN0rKJ_HDhUQHDNzLE&s=_F9MBBBx2h6jFmhkaoZSmtybPEujq80mthkaRrMLR8o&e=





_______________________________________________

gem5-users mailing list

[email protected]

http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users










_______________________________________________ gem5-users mailing list 
[email protected] 
https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=SgrYiG5YTc70hfhS5naZL8PFSgLUaqAw_yp_z3xGjCA&s=y-ooPUGBSWdig5Q7Ze3poXJWMhGMrPCZeL-TkzByVog&e=





_______________________________________________

gem5-users mailing list

[email protected]

http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users








IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for
 any purpose, or store or copy the information in any medium. Thank you.


_______________________________________________

gem5-users mailing list

[email protected]

http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
[email protected]
https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=mNEWBpFNimzBKOm6hgVKAf7LJ4Twni8XcjVErzWM4FE&s=vJkqXEUXb3dP0N1YNAzNcqxVz64WpLpmXH5Mcqqw9EY&e=
                                         
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to