Hi,

Are you taking checkpoints? If yes then getting a deadlock is normal

On Mon, Aug 15, 2016 at 2:28 PM, George Mappouras <
georgios.mappou...@duke.edu> wrote:

> Hi Jason,
>
> Thanks for the suggestions. I use MESI_Two_Level and I also compliled gem5
> for that protocol like this:
> *scons RUBY=TRUE PROTOCOL=MESI_Two_Level build/X86/gem5.opt -j8*
>
> *"The system you're simulating is quite a stress test for the Ruby
> protocol you're using! "*
> Why are you saying that? Could you give me some inside of why MESI could
> make my system slower comparing to other protocols? What would you suggest
> me to use?
>
> George
>
> ------------------------------
> From: ja...@lowepower.com
> Date: Mon, 15 Aug 2016 17:01:47 +0000
> To: gem5-users@gem5.org
> Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock
>
>
> Hi George,
>
> The system you're simulating is quite a stress test for the Ruby protocol
> you're using! What protocol have you compiled?
>
> The problem you're running into could be very simple. It's possible that
> due to the high bandwidth of the system, some of the queues in Ruby are
> filling up and causing the average memory access latency to skyrocket due
> to queuing delays. If this happens, the protocol could be "correct" but
> still cause a deadlock detection. In this case, you may be able to increase
> the deadlock threshold and see the application start to work again. We
> often see this with GPU workloads.
>
> However, it's more likely a bug somewhere in the protocol you're using. To
> debug this, you'll need to dig into the protocol. The debug flag
> "ProtocolTrace" is useful here. With this debug flag you'll see every
> transition in Ruby. With this information you should be able to trace back
> and find the memory operation that's causing the deadlock. I would also
> suggest using "--debug-start=<tick>" and pick the highest tick value you
> can before the offending operation (e.g., a little less than
> 5676227351000). Otherwise the trace may be 10s-100s of GB (and take days to
> generate).
>
> Hopefully this helps you get on the right track. Good luck!
>
> Jason
>
> On Wed, Aug 10, 2016 at 6:50 PM George Mappouras <
> georgios.mappou...@duke.edu> wrote:
>
> Hi all,
>
> I had some trouble while running Parsec benchmarks with gem5 + Ruby (using
> MESI two level protocol). I found out that some of the benchmarks will
> cause gem5 to crush because a deadlock was detected. The configuration I
> use is the follow:
>
> I have 8 nodes connected to a ring. Each node is a core connected with a
> private 64KB L1 cache and one channel of High Bandwidth Memory (HBM). Also
> each core has one out of 8 banks of the shared 8MB L2 cache connected to
> them. The command I run looks like this:
>
> * ./build/X86/gem5.opt configs/example/fs.py
> --disk-image=x86root-parsec.img --kernel=x86_64-vmlinux-2.6.22.9.smp -n 8
> --cpu-type=detailed --cpu-clock=1GHz --caches --l1d_size=64kB
> --num-l2caches=8 --l2_size=8MB --mem-type=HBM_1000_4H_x128 --mem-channels=8
> --mem-size=2GB --ruby --num-dirs=8 --topology=Torus --mesh-rows=1
> --access-backing-store --script=a_parsec_script.sh*
>
> I use the latest version of gem5 and I have no problem booting or running
> commands on the simulated machine. However as i mentioned above some
> benchmarks cause gem5 to crush with a message like this:
>
> *panic: Possible Deadlock detected. Aborting!*
> *version: 1 request.paddr: 0x53d6a000 m_writeRequestTable: 1 current time:
> 5677053833000 issue_time: 5676227351000 difference: 826482000*
> * @ tick 5677053833000*
> *[wakeup:build/X86/mem/ruby/system/Sequencer.cc, line 119]*
> *Memory Usage: 5799824 KBytes*
> *Program aborted at tick 5677053833000*
> *--- BEGIN LIBC BACKTRACE ---*
> *./build/X86/gem5.opt(_Z15print_backtracev+0x15)[0x9f8275]*
> *./build/X86/gem5.opt(_Z12abortHandleri+0x36)[0xa09536]*
> */lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f3dd5cdf330]*
> */lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f3dd4530c37]*
> */lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f3dd4534028]*
> *./build/X86/gem5.opt(_Z15__exit_epilogueiPKcS0_iS0_+0x1ec)[0x9bba6c]*
>
> *./build/X86/gem5.opt(_Z14__exit_messageIIjmmmmmEEvPKciS1_S1_iS1_DpRKT_+0xc7)[0x93ca27]*
> *./build/X86/gem5.opt(_ZN9Sequencer6wakeupEv+0x266)[0x93a936]*
> *./build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xb1)[0xa018e1]*
> *./build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x38)[0xa22938]*
> *./build/X86/gem5.opt(_Z8simulatem+0x1fb)[0xa22ebb]*
> *./build/X86/gem5.opt[0x969d7c]*
>
> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x45f7)[0x7f3dd58f7af7]*
>
> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]*
>
> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]*
>
> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]*
>
> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]*
>
> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]*
>
> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]*
>
> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x563e)[0x7f3dd58f8b3e]*
>
> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]*
>
> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]*
>
> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]*
>
> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]*
>
> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyRun_StringFlags+0x79)[0x7f3dd58f34b9]*
> *./build/X86/gem5.opt(_Z6m5MainiPPc+0x5f)[0xa08caf]*
> *./build/X86/gem5.opt(main+0x33)[0x701933]*
> */lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f3dd451bf45]*
> *./build/X86/gem5.opt[0x725e83]*
> *--- END LIBC BACKTRACE ---*
>
> Anyone can help me figure out what the problem is? Am I missing something?
> Does my system configuration match the command I run? I would appreciate any
> help!
>
> Thanks,
> George
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=QGmz4aNiITDmIyB9HAQE4AQ1CxN0rKJ_HDhUQHDNzLE&s=_F9MBBBx2h6jFmhkaoZSmtybPEujq80mthkaRrMLR8o&e=>
>
>
> _______________________________________________ gem5-users mailing list
> gem5-users@gem5.org https://urldefense.proofpoint.
> com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=
> CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_
> JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=QGmz4aNiITDmIyB9HAQE4AQ1CxN0rK
> J_HDhUQHDNzLE&s=_F9MBBBx2h6jFmhkaoZSmtybPEujq80mthkaRrMLR8o&e=
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to