Hi, Are you taking checkpoints? If yes then getting a deadlock is normal
On Mon, Aug 15, 2016 at 2:28 PM, George Mappouras < georgios.mappou...@duke.edu> wrote: > Hi Jason, > > Thanks for the suggestions. I use MESI_Two_Level and I also compliled gem5 > for that protocol like this: > *scons RUBY=TRUE PROTOCOL=MESI_Two_Level build/X86/gem5.opt -j8* > > *"The system you're simulating is quite a stress test for the Ruby > protocol you're using! "* > Why are you saying that? Could you give me some inside of why MESI could > make my system slower comparing to other protocols? What would you suggest > me to use? > > George > > ------------------------------ > From: ja...@lowepower.com > Date: Mon, 15 Aug 2016 17:01:47 +0000 > To: gem5-users@gem5.org > Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock > > > Hi George, > > The system you're simulating is quite a stress test for the Ruby protocol > you're using! What protocol have you compiled? > > The problem you're running into could be very simple. It's possible that > due to the high bandwidth of the system, some of the queues in Ruby are > filling up and causing the average memory access latency to skyrocket due > to queuing delays. If this happens, the protocol could be "correct" but > still cause a deadlock detection. In this case, you may be able to increase > the deadlock threshold and see the application start to work again. We > often see this with GPU workloads. > > However, it's more likely a bug somewhere in the protocol you're using. To > debug this, you'll need to dig into the protocol. The debug flag > "ProtocolTrace" is useful here. With this debug flag you'll see every > transition in Ruby. With this information you should be able to trace back > and find the memory operation that's causing the deadlock. I would also > suggest using "--debug-start=<tick>" and pick the highest tick value you > can before the offending operation (e.g., a little less than > 5676227351000). Otherwise the trace may be 10s-100s of GB (and take days to > generate). > > Hopefully this helps you get on the right track. Good luck! > > Jason > > On Wed, Aug 10, 2016 at 6:50 PM George Mappouras < > georgios.mappou...@duke.edu> wrote: > > Hi all, > > I had some trouble while running Parsec benchmarks with gem5 + Ruby (using > MESI two level protocol). I found out that some of the benchmarks will > cause gem5 to crush because a deadlock was detected. The configuration I > use is the follow: > > I have 8 nodes connected to a ring. Each node is a core connected with a > private 64KB L1 cache and one channel of High Bandwidth Memory (HBM). Also > each core has one out of 8 banks of the shared 8MB L2 cache connected to > them. The command I run looks like this: > > * ./build/X86/gem5.opt configs/example/fs.py > --disk-image=x86root-parsec.img --kernel=x86_64-vmlinux-2.6.22.9.smp -n 8 > --cpu-type=detailed --cpu-clock=1GHz --caches --l1d_size=64kB > --num-l2caches=8 --l2_size=8MB --mem-type=HBM_1000_4H_x128 --mem-channels=8 > --mem-size=2GB --ruby --num-dirs=8 --topology=Torus --mesh-rows=1 > --access-backing-store --script=a_parsec_script.sh* > > I use the latest version of gem5 and I have no problem booting or running > commands on the simulated machine. However as i mentioned above some > benchmarks cause gem5 to crush with a message like this: > > *panic: Possible Deadlock detected. Aborting!* > *version: 1 request.paddr: 0x53d6a000 m_writeRequestTable: 1 current time: > 5677053833000 issue_time: 5676227351000 difference: 826482000* > * @ tick 5677053833000* > *[wakeup:build/X86/mem/ruby/system/Sequencer.cc, line 119]* > *Memory Usage: 5799824 KBytes* > *Program aborted at tick 5677053833000* > *--- BEGIN LIBC BACKTRACE ---* > *./build/X86/gem5.opt(_Z15print_backtracev+0x15)[0x9f8275]* > *./build/X86/gem5.opt(_Z12abortHandleri+0x36)[0xa09536]* > */lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f3dd5cdf330]* > */lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f3dd4530c37]* > */lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f3dd4534028]* > *./build/X86/gem5.opt(_Z15__exit_epilogueiPKcS0_iS0_+0x1ec)[0x9bba6c]* > > *./build/X86/gem5.opt(_Z14__exit_messageIIjmmmmmEEvPKciS1_S1_iS1_DpRKT_+0xc7)[0x93ca27]* > *./build/X86/gem5.opt(_ZN9Sequencer6wakeupEv+0x266)[0x93a936]* > *./build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xb1)[0xa018e1]* > *./build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x38)[0xa22938]* > *./build/X86/gem5.opt(_Z8simulatem+0x1fb)[0xa22ebb]* > *./build/X86/gem5.opt[0x969d7c]* > > */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x45f7)[0x7f3dd58f7af7]* > > */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]* > > */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]* > > */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]* > > */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]* > > */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]* > > */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]* > > */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x563e)[0x7f3dd58f8b3e]* > > */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]* > > */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]* > > */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]* > > */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]* > > */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyRun_StringFlags+0x79)[0x7f3dd58f34b9]* > *./build/X86/gem5.opt(_Z6m5MainiPPc+0x5f)[0xa08caf]* > *./build/X86/gem5.opt(main+0x33)[0x701933]* > */lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f3dd451bf45]* > *./build/X86/gem5.opt[0x725e83]* > *--- END LIBC BACKTRACE ---* > > Anyone can help me figure out what the problem is? Am I missing something? > Does my system configuration match the command I run? I would appreciate any > help! > > Thanks, > George > > _______________________________________________ > gem5-users mailing list > gem5-users@gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > <https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=QGmz4aNiITDmIyB9HAQE4AQ1CxN0rKJ_HDhUQHDNzLE&s=_F9MBBBx2h6jFmhkaoZSmtybPEujq80mthkaRrMLR8o&e=> > > > _______________________________________________ gem5-users mailing list > gem5-users@gem5.org https://urldefense.proofpoint. > com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d= > CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_ > JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=QGmz4aNiITDmIyB9HAQE4AQ1CxN0rK > J_HDhUQHDNzLE&s=_F9MBBBx2h6jFmhkaoZSmtybPEujq80mthkaRrMLR8o&e= > > _______________________________________________ > gem5-users mailing list > gem5-users@gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users