Hi George, I was only saying that your system with 8 HBM controllers is much more bandwidth than the original developers of Ruby imagined. Therefore, I wouldn't be surprised if you are encountering some bugs that others have never seen.
MESI_two_level is one of the more tested protocols, so I would say you're using a reasonable protocol. You may want to try a different topology (say point-to-point) to test to see if that's causing (or at least has a correlation to) the issue. Jason On Tue, Aug 16, 2016 at 2:26 AM Andreas Hansson <[email protected]> wrote: > Hi all, > > Remember that the timing CPU should _not_ be used for any > performance-relatated experiments. Stick to the in-order and out-of-order > CPUs for any such use-cases. > > In general I would also expect less issues with the classic memory system > (especially with full system), and it does a fine job at modelling > crossbar-based many-core systems. At the moment it does not support X86 out > of the box, but it may still be worth considering if you’re having issues > with Ruby. > > Andreas > > From: gem5-users <[email protected]> on behalf of Ruohuang > Zheng <[email protected]> > Reply-To: gem5 users mailing list <[email protected]> > Date: Tuesday, 16 August 2016 at 02:46 > To: gem5 users mailing list <[email protected]> > > Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock > > Hi, > > You can try running the benchmarks in timing CPU instead of detailed to > see if it works. As far as I know, there are various bugs in Ruby and using > detailed CPU makes the bugs more likely to be exposed. > > On Mon, Aug 15, 2016 at 4:37 PM, George Mappouras < > [email protected]> wrote: > >> Thanks for the reply. No I do not use checkpoints. I am aware of the >> checkpoint problem (found that out the hard way XD). I run full system from >> start to end and running one of the parsec benchmarks each time with small >> size input (I do have multiple machines running in parallel). >> >> George >> >> ------------------------------ >> From: [email protected] >> Date: Mon, 15 Aug 2016 16:07:39 -0700 >> >> To: [email protected] >> Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock >> >> Hi, >> >> Are you taking checkpoints? If yes then getting a deadlock is normal >> >> On Mon, Aug 15, 2016 at 2:28 PM, George Mappouras < >> [email protected]> wrote: >> >> Hi Jason, >> >> Thanks for the suggestions. I use MESI_Two_Level and I also compliled >> gem5 for that protocol like this: >> *scons RUBY=TRUE PROTOCOL=MESI_Two_Level build/X86/gem5.opt -j8* >> >> *"The system you're simulating is quite a stress test for the Ruby >> protocol you're using! "* >> Why are you saying that? Could you give me some inside of why MESI could >> make my system slower comparing to other protocols? What would you suggest >> me to use? >> >> George >> >> ------------------------------ >> From: [email protected] >> Date: Mon, 15 Aug 2016 17:01:47 +0000 >> To: [email protected] >> Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock >> >> >> Hi George, >> >> The system you're simulating is quite a stress test for the Ruby protocol >> you're using! What protocol have you compiled? >> >> The problem you're running into could be very simple. It's possible that >> due to the high bandwidth of the system, some of the queues in Ruby are >> filling up and causing the average memory access latency to skyrocket due >> to queuing delays. If this happens, the protocol could be "correct" but >> still cause a deadlock detection. In this case, you may be able to increase >> the deadlock threshold and see the application start to work again. We >> often see this with GPU workloads. >> >> However, it's more likely a bug somewhere in the protocol you're using. >> To debug this, you'll need to dig into the protocol. The debug flag >> "ProtocolTrace" is useful here. With this debug flag you'll see every >> transition in Ruby. With this information you should be able to trace back >> and find the memory operation that's causing the deadlock. I would also >> suggest using "--debug-start=<tick>" and pick the highest tick value you >> can before the offending operation (e.g., a little less than >> 5676227351000). Otherwise the trace may be 10s-100s of GB (and take days to >> generate). >> >> Hopefully this helps you get on the right track. Good luck! >> >> Jason >> >> On Wed, Aug 10, 2016 at 6:50 PM George Mappouras < >> [email protected]> wrote: >> >> Hi all, >> >> I had some trouble while running Parsec benchmarks with gem5 + Ruby >> (using MESI two level protocol). I found out that some of the benchmarks >> will cause gem5 to crush because a deadlock was detected. The configuration >> I use is the follow: >> >> I have 8 nodes connected to a ring. Each node is a core connected with a >> private 64KB L1 cache and one channel of High Bandwidth Memory (HBM). Also >> each core has one out of 8 banks of the shared 8MB L2 cache connected to >> them. The command I run looks like this: >> >> * ./build/X86/gem5.opt configs/example/fs.py >> --disk-image=x86root-parsec.img --kernel=x86_64-vmlinux-2.6.22.9.smp -n 8 >> --cpu-type=detailed --cpu-clock=1GHz --caches --l1d_size=64kB >> --num-l2caches=8 --l2_size=8MB --mem-type=HBM_1000_4H_x128 --mem-channels=8 >> --mem-size=2GB --ruby --num-dirs=8 --topology=Torus --mesh-rows=1 >> --access-backing-store --script=a_parsec_script.sh* >> >> I use the latest version of gem5 and I have no problem booting or running >> commands on the simulated machine. However as i mentioned above some >> benchmarks cause gem5 to crush with a message like this: >> >> *panic: Possible Deadlock detected. Aborting!* >> *version: 1 request.paddr: 0x53d6a000 m_writeRequestTable: 1 current >> time: 5677053833000 issue_time: 5676227351000 difference: 826482000* >> * @ tick 5677053833000* >> *[wakeup:build/X86/mem/ruby/system/Sequencer.cc, line 119]* >> *Memory Usage: 5799824 KBytes* >> *Program aborted at tick 5677053833000* >> *--- BEGIN LIBC BACKTRACE ---* >> *./build/X86/gem5.opt(_Z15print_backtracev+0x15)[0x9f8275]* >> *./build/X86/gem5.opt(_Z12abortHandleri+0x36)[0xa09536]* >> */lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f3dd5cdf330]* >> */lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f3dd4530c37]* >> */lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f3dd4534028]* >> *./build/X86/gem5.opt(_Z15__exit_epilogueiPKcS0_iS0_+0x1ec)[0x9bba6c]* >> >> *./build/X86/gem5.opt(_Z14__exit_messageIIjmmmmmEEvPKciS1_S1_iS1_DpRKT_+0xc7)[0x93ca27]* >> *./build/X86/gem5.opt(_ZN9Sequencer6wakeupEv+0x266)[0x93a936]* >> *./build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xb1)[0xa018e1]* >> *./build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x38)[0xa22938]* >> *./build/X86/gem5.opt(_Z8simulatem+0x1fb)[0xa22ebb]* >> *./build/X86/gem5.opt[0x969d7c]* >> >> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x45f7)[0x7f3dd58f7af7]* >> >> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]* >> >> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]* >> >> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]* >> >> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]* >> >> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]* >> >> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]* >> >> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x563e)[0x7f3dd58f8b3e]* >> >> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]* >> >> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]* >> >> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]* >> >> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]* >> >> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyRun_StringFlags+0x79)[0x7f3dd58f34b9]* >> *./build/X86/gem5.opt(_Z6m5MainiPPc+0x5f)[0xa08caf]* >> *./build/X86/gem5.opt(main+0x33)[0x701933]* >> */lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f3dd451bf45]* >> *./build/X86/gem5.opt[0x725e83]* >> *--- END LIBC BACKTRACE ---* >> >> Anyone can help me figure out what the problem is? Am I missing >> something? Does my system configuration match the command I run? I would >> appreciate any help! >> >> Thanks, >> George >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=QGmz4aNiITDmIyB9HAQE4AQ1CxN0rKJ_HDhUQHDNzLE&s=_F9MBBBx2h6jFmhkaoZSmtybPEujq80mthkaRrMLR8o&e=> >> >> >> _______________________________________________ gem5-users mailing list >> [email protected] >> https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=QGmz4aNiITDmIyB9HAQE4AQ1CxN0rKJ_HDhUQHDNzLE&s=_F9MBBBx2h6jFmhkaoZSmtybPEujq80mthkaRrMLR8o&e= >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=SgrYiG5YTc70hfhS5naZL8PFSgLUaqAw_yp_z3xGjCA&s=y-ooPUGBSWdig5Q7Ze3poXJWMhGMrPCZeL-TkzByVog&e=> >> >> >> >> _______________________________________________ gem5-users mailing list >> [email protected] >> https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=SgrYiG5YTc70hfhS5naZL8PFSgLUaqAw_yp_z3xGjCA&s=y-ooPUGBSWdig5Q7Ze3poXJWMhGMrPCZeL-TkzByVog&e= >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
