Hi Jason, That's is a fair point. I have couple things to try now. Thanks a lot for the advice and help!George From: [email protected] Date: Tue, 16 Aug 2016 13:27:30 +0000 To: [email protected] Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock
Hi George, I was only saying that your system with 8 HBM controllers is much more bandwidth than the original developers of Ruby imagined. Therefore, I wouldn't be surprised if you are encountering some bugs that others have never seen. MESI_two_level is one of the more tested protocols, so I would say you're using a reasonable protocol. You may want to try a different topology (say point-to-point) to test to see if that's causing (or at least has a correlation to) the issue. Jason On Tue, Aug 16, 2016 at 2:26 AM Andreas Hansson <[email protected]> wrote: Hi all, Remember that the timing CPU should _not_ be used for any performance-relatated experiments. Stick to the in-order and out-of-order CPUs for any such use-cases. In general I would also expect less issues with the classic memory system (especially with full system), and it does a fine job at modelling crossbar-based many-core systems. At the moment it does not support X86 out of the box, but it may still be worth considering if you’re having issues with Ruby. Andreas From: gem5-users <[email protected]> on behalf of Ruohuang Zheng <[email protected]> Reply-To: gem5 users mailing list <[email protected]> Date: Tuesday, 16 August 2016 at 02:46 To: gem5 users mailing list <[email protected]> Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock Hi, You can try running the benchmarks in timing CPU instead of detailed to see if it works. As far as I know, there are various bugs in Ruby and using detailed CPU makes the bugs more likely to be exposed. On Mon, Aug 15, 2016 at 4:37 PM, George Mappouras <[email protected]> wrote: Thanks for the reply. No I do not use checkpoints. I am aware of the checkpoint problem (found that out the hard way XD). I run full system from start to end and running one of the parsec benchmarks each time with small size input (I do have multiple machines running in parallel). George From: [email protected] Date: Mon, 15 Aug 2016 16:07:39 -0700 To: [email protected] Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock Hi, Are you taking checkpoints? If yes then getting a deadlock is normal On Mon, Aug 15, 2016 at 2:28 PM, George Mappouras <[email protected]> wrote: Hi Jason, Thanks for the suggestions. I use MESI_Two_Level and I also compliled gem5 for that protocol like this: scons RUBY=TRUE PROTOCOL=MESI_Two_Level build/X86/gem5.opt -j8 "The system you're simulating is quite a stress test for the Ruby protocol you're using! " Why are you saying that? Could you give me some inside of why MESI could make my system slower comparing to other protocols? What would you suggest me to use? George From: [email protected] Date: Mon, 15 Aug 2016 17:01:47 +0000 To: [email protected] Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock Hi George, The system you're simulating is quite a stress test for the Ruby protocol you're using! What protocol have you compiled? The problem you're running into could be very simple. It's possible that due to the high bandwidth of the system, some of the queues in Ruby are filling up and causing the average memory access latency to skyrocket due to queuing delays. If this happens, the protocol could be "correct" but still cause a deadlock detection. In this case, you may be able to increase the deadlock threshold and see the application start to work again. We often see this with GPU workloads. However, it's more likely a bug somewhere in the protocol you're using. To debug this, you'll need to dig into the protocol. The debug flag "ProtocolTrace" is useful here. With this debug flag you'll see every transition in Ruby. With this information you should be able to trace back and find the memory operation that's causing the deadlock. I would also suggest using "--debug-start=<tick>" and pick the highest tick value you can before the offending operation (e.g., a little less than 5676227351000). Otherwise the trace may be 10s-100s of GB (and take days to generate). Hopefully this helps you get on the right track. Good luck! Jason On Wed, Aug 10, 2016 at 6:50 PM George Mappouras <[email protected]> wrote: Hi all, I had some trouble while running Parsec benchmarks with gem5 + Ruby (using MESI two level protocol). I found out that some of the benchmarks will cause gem5 to crush because a deadlock was detected. The configuration I use is the follow: I have 8 nodes connected to a ring. Each node is a core connected with a private 64KB L1 cache and one channel of High Bandwidth Memory (HBM). Also each core has one out of 8 banks of the shared 8MB L2 cache connected to them. The command I run looks like this: ./build/X86/gem5.opt configs/example/fs.py --disk-image=x86root-parsec.img --kernel=x86_64-vmlinux-2.6.22.9.smp -n 8 --cpu-type=detailed --cpu-clock=1GHz --caches --l1d_size=64kB --num-l2caches=8 --l2_size=8MB --mem-type=HBM_1000_4H_x128 --mem-channels=8 --mem-size=2GB --ruby --num-dirs=8 --topology=Torus --mesh-rows=1 --access-backing-store --script=a_parsec_script.sh I use the latest version of gem5 and I have no problem booting or running commands on the simulated machine. However as i mentioned above some benchmarks cause gem5 to crush with a message like this: panic: Possible Deadlock detected. Aborting! version: 1 request.paddr: 0x53d6a000 m_writeRequestTable: 1 current time: 5677053833000 issue_time: 5676227351000 difference: 826482000 @ tick 5677053833000 [wakeup:build/X86/mem/ruby/system/Sequencer.cc, line 119] Memory Usage: 5799824 KBytes Program aborted at tick 5677053833000 --- BEGIN LIBC BACKTRACE --- ./build/X86/gem5.opt(_Z15print_backtracev+0x15)[0x9f8275] ./build/X86/gem5.opt(_Z12abortHandleri+0x36)[0xa09536] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f3dd5cdf330] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f3dd4530c37] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f3dd4534028] ./build/X86/gem5.opt(_Z15__exit_epilogueiPKcS0_iS0_+0x1ec)[0x9bba6c] ./build/X86/gem5.opt(_Z14__exit_messageIIjmmmmmEEvPKciS1_S1_iS1_DpRKT_+0xc7)[0x93ca27] ./build/X86/gem5.opt(_ZN9Sequencer6wakeupEv+0x266)[0x93a936] ./build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xb1)[0xa018e1] ./build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x38)[0xa22938] ./build/X86/gem5.opt(_Z8simulatem+0x1fb)[0xa22ebb] ./build/X86/gem5.opt[0x969d7c] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x45f7)[0x7f3dd58f7af7] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x563e)[0x7f3dd58f8b3e] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682] /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyRun_StringFlags+0x79)[0x7f3dd58f34b9] ./build/X86/gem5.opt(_Z6m5MainiPPc+0x5f)[0xa08caf] ./build/X86/gem5.opt(main+0x33)[0x701933] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f3dd451bf45] ./build/X86/gem5.opt[0x725e83] --- END LIBC BACKTRACE --- Anyone can help me figure out what the problem is? Am I missing something? Does my system configuration match the command I run? I would appreciate any help! Thanks, George _______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users _______________________________________________ gem5-users mailing list [email protected] https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=QGmz4aNiITDmIyB9HAQE4AQ1CxN0rKJ_HDhUQHDNzLE&s=_F9MBBBx2h6jFmhkaoZSmtybPEujq80mthkaRrMLR8o&e= _______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users _______________________________________________ gem5-users mailing list [email protected] https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=SgrYiG5YTc70hfhS5naZL8PFSgLUaqAw_yp_z3xGjCA&s=y-ooPUGBSWdig5Q7Ze3poXJWMhGMrPCZeL-TkzByVog&e= _______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users _______________________________________________ gem5-users mailing list [email protected] https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=mNEWBpFNimzBKOm6hgVKAf7LJ4Twni8XcjVErzWM4FE&s=vJkqXEUXb3dP0N1YNAzNcqxVz64WpLpmXH5Mcqqw9EY&e=
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
