Hi George,

I was only saying that your system with 8 HBM controllers is much more
bandwidth than the original developers of Ruby imagined. Therefore, I
wouldn't be surprised if you are encountering some bugs that others have
never seen.

MESI_two_level is one of the more tested protocols, so I would say you're
using a reasonable protocol. You may want to try a different topology (say
point-to-point) to test to see if that's causing (or at least has a
correlation to) the issue.

Jason

On Tue, Aug 16, 2016 at 2:26 AM Andreas Hansson <[email protected]>
wrote:

> Hi all,
>
> Remember that the timing CPU should _not_ be used for any
> performance-relatated experiments. Stick to the in-order and out-of-order
> CPUs for any such use-cases.
>
> In general I would also expect less issues with the classic memory system
> (especially with full system), and it does a fine job at modelling
> crossbar-based many-core systems. At the moment it does not support X86 out
> of the box, but it may still be worth considering if you’re having issues
> with Ruby.
>
> Andreas
>
> From: gem5-users <[email protected]> on behalf of Ruohuang
> Zheng <[email protected]>
> Reply-To: gem5 users mailing list <[email protected]>
> Date: Tuesday, 16 August 2016 at 02:46
> To: gem5 users mailing list <[email protected]>
>
> Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock
>
> Hi,
>
> You can try running the benchmarks in timing CPU instead of detailed to
> see if it works. As far as I know, there are various bugs in Ruby and using
> detailed CPU makes the bugs more likely to be exposed.
>
> On Mon, Aug 15, 2016 at 4:37 PM, George Mappouras <
> [email protected]> wrote:
>
>> Thanks for the reply. No I do not use checkpoints. I am aware of the
>> checkpoint problem (found that out the hard way XD). I run full system from
>> start to end and running one of the parsec benchmarks each time with small
>> size input (I do have multiple machines running in parallel).
>>
>> George
>>
>> ------------------------------
>> From: [email protected]
>> Date: Mon, 15 Aug 2016 16:07:39 -0700
>>
>> To: [email protected]
>> Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock
>>
>> Hi,
>>
>> Are you taking checkpoints? If yes then getting a deadlock is normal
>>
>> On Mon, Aug 15, 2016 at 2:28 PM, George Mappouras <
>> [email protected]> wrote:
>>
>> Hi Jason,
>>
>> Thanks for the suggestions. I use MESI_Two_Level and I also compliled
>> gem5 for that protocol like this:
>> *scons RUBY=TRUE PROTOCOL=MESI_Two_Level build/X86/gem5.opt -j8*
>>
>> *"The system you're simulating is quite a stress test for the Ruby
>> protocol you're using! "*
>> Why are you saying that? Could you give me some inside of why MESI could
>> make my system slower comparing to other protocols? What would you suggest
>> me to use?
>>
>> George
>>
>> ------------------------------
>> From: [email protected]
>> Date: Mon, 15 Aug 2016 17:01:47 +0000
>> To: [email protected]
>> Subject: Re: [gem5-users] FW: Gem5-Ruby_HBM_Parsec_Deadlock
>>
>>
>> Hi George,
>>
>> The system you're simulating is quite a stress test for the Ruby protocol
>> you're using! What protocol have you compiled?
>>
>> The problem you're running into could be very simple. It's possible that
>> due to the high bandwidth of the system, some of the queues in Ruby are
>> filling up and causing the average memory access latency to skyrocket due
>> to queuing delays. If this happens, the protocol could be "correct" but
>> still cause a deadlock detection. In this case, you may be able to increase
>> the deadlock threshold and see the application start to work again. We
>> often see this with GPU workloads.
>>
>> However, it's more likely a bug somewhere in the protocol you're using.
>> To debug this, you'll need to dig into the protocol. The debug flag
>> "ProtocolTrace" is useful here. With this debug flag you'll see every
>> transition in Ruby. With this information you should be able to trace back
>> and find the memory operation that's causing the deadlock. I would also
>> suggest using "--debug-start=<tick>" and pick the highest tick value you
>> can before the offending operation (e.g., a little less than
>> 5676227351000). Otherwise the trace may be 10s-100s of GB (and take days to
>> generate).
>>
>> Hopefully this helps you get on the right track. Good luck!
>>
>> Jason
>>
>> On Wed, Aug 10, 2016 at 6:50 PM George Mappouras <
>> [email protected]> wrote:
>>
>> Hi all,
>>
>> I had some trouble while running Parsec benchmarks with gem5 + Ruby
>> (using MESI two level protocol). I found out that some of the benchmarks
>> will cause gem5 to crush because a deadlock was detected. The configuration
>> I use is the follow:
>>
>> I have 8 nodes connected to a ring. Each node is a core connected with a
>> private 64KB L1 cache and one channel of High Bandwidth Memory (HBM). Also
>> each core has one out of 8 banks of the shared 8MB L2 cache connected to
>> them. The command I run looks like this:
>>
>> * ./build/X86/gem5.opt configs/example/fs.py
>> --disk-image=x86root-parsec.img --kernel=x86_64-vmlinux-2.6.22.9.smp -n 8
>> --cpu-type=detailed --cpu-clock=1GHz --caches --l1d_size=64kB
>> --num-l2caches=8 --l2_size=8MB --mem-type=HBM_1000_4H_x128 --mem-channels=8
>> --mem-size=2GB --ruby --num-dirs=8 --topology=Torus --mesh-rows=1
>> --access-backing-store --script=a_parsec_script.sh*
>>
>> I use the latest version of gem5 and I have no problem booting or running
>> commands on the simulated machine. However as i mentioned above some
>> benchmarks cause gem5 to crush with a message like this:
>>
>> *panic: Possible Deadlock detected. Aborting!*
>> *version: 1 request.paddr: 0x53d6a000 m_writeRequestTable: 1 current
>> time: 5677053833000 issue_time: 5676227351000 difference: 826482000*
>> * @ tick 5677053833000*
>> *[wakeup:build/X86/mem/ruby/system/Sequencer.cc, line 119]*
>> *Memory Usage: 5799824 KBytes*
>> *Program aborted at tick 5677053833000*
>> *--- BEGIN LIBC BACKTRACE ---*
>> *./build/X86/gem5.opt(_Z15print_backtracev+0x15)[0x9f8275]*
>> *./build/X86/gem5.opt(_Z12abortHandleri+0x36)[0xa09536]*
>> */lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f3dd5cdf330]*
>> */lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f3dd4530c37]*
>> */lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f3dd4534028]*
>> *./build/X86/gem5.opt(_Z15__exit_epilogueiPKcS0_iS0_+0x1ec)[0x9bba6c]*
>>
>> *./build/X86/gem5.opt(_Z14__exit_messageIIjmmmmmEEvPKciS1_S1_iS1_DpRKT_+0xc7)[0x93ca27]*
>> *./build/X86/gem5.opt(_ZN9Sequencer6wakeupEv+0x266)[0x93a936]*
>> *./build/X86/gem5.opt(_ZN10EventQueue10serviceOneEv+0xb1)[0xa018e1]*
>> *./build/X86/gem5.opt(_Z9doSimLoopP10EventQueue+0x38)[0xa22938]*
>> *./build/X86/gem5.opt(_Z8simulatem+0x1fb)[0xa22ebb]*
>> *./build/X86/gem5.opt[0x969d7c]*
>>
>> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x45f7)[0x7f3dd58f7af7]*
>>
>> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]*
>>
>> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]*
>>
>> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]*
>>
>> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4b59)[0x7f3dd58f8059]*
>>
>> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]*
>>
>> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]*
>>
>> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x563e)[0x7f3dd58f8b3e]*
>>
>> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]*
>>
>> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x48d8)[0x7f3dd58f7dd8]*
>>
>> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7f3dd58f954d]*
>>
>> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f3dd58f9682]*
>>
>> */usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0(PyRun_StringFlags+0x79)[0x7f3dd58f34b9]*
>> *./build/X86/gem5.opt(_Z6m5MainiPPc+0x5f)[0xa08caf]*
>> *./build/X86/gem5.opt(main+0x33)[0x701933]*
>> */lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f3dd451bf45]*
>> *./build/X86/gem5.opt[0x725e83]*
>> *--- END LIBC BACKTRACE ---*
>>
>> Anyone can help me figure out what the problem is? Am I missing
>> something? Does my system configuration match the command I run? I would
>> appreciate any help!
>>
>> Thanks,
>> George
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=QGmz4aNiITDmIyB9HAQE4AQ1CxN0rKJ_HDhUQHDNzLE&s=_F9MBBBx2h6jFmhkaoZSmtybPEujq80mthkaRrMLR8o&e=>
>>
>>
>> _______________________________________________ gem5-users mailing list
>> [email protected]
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=QGmz4aNiITDmIyB9HAQE4AQ1CxN0rKJ_HDhUQHDNzLE&s=_F9MBBBx2h6jFmhkaoZSmtybPEujq80mthkaRrMLR8o&e=
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=SgrYiG5YTc70hfhS5naZL8PFSgLUaqAw_yp_z3xGjCA&s=y-ooPUGBSWdig5Q7Ze3poXJWMhGMrPCZeL-TkzByVog&e=>
>>
>>
>>
>> _______________________________________________ gem5-users mailing list
>> [email protected]
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__m5sim.org_cgi-2Dbin_mailman_listinfo_gem5-2Dusers&d=CwIGaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=oOETi_JJQtmsIlNOkjD-zBgYsSwvlT9MCux6ZA-DoD0&m=SgrYiG5YTc70hfhS5naZL8PFSgLUaqAw_yp_z3xGjCA&s=y-ooPUGBSWdig5Q7Ze3poXJWMhGMrPCZeL-TkzByVog&e=
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to