Martin, Andreas, I would strongly suggest not penalising the single-thread common case if at all possible in both scenarios (ref counting and locking). I would suggest to reduce the clutter: have a proxy class that either uses the atomic / thread-safe mutexes / ref counters to the simplistic implementations that are amendable to more aggressive optimisation. The observed overhead is structural and all the optimisation we can do will not work around the fact that atomics are expensive.
Thanks, Stephan On 06.08.2014 17:05, "Brown, Martin via gem5-dev" <[email protected]> wrote: >Thanks Andreas, > >I will definitely look into those as well. > >From: Andreas Hansson [mailto:[email protected]] >Sent: Wednesday, August 06, 2014 10:38 AM >To: Brown, Martin; Default >Subject: Re: Review Request 2320: sim: stopgap for race-conditions when >using multiple EventQueues > >Hi Martin, > >Thanks for running the experiments. It sounds rather painful indeed. > >Concerning the atomic_int, I've actually managed to shift most use of the >in-house RefCountingPtr to use (thread-safe) STL shared_ptr, and I've >seen a performance drop of roughly 5% as a result of this change. I >should be able to get the patches on the board in a not-too-distant >future. Another observation when using the shared_ptr is that compiling >with -march=native seem to be getting some of the loss back. It might be >worth checking in your case as well. > >Andreas > >From: <Brown>, Martin <[email protected]<mailto:[email protected]>> >Date: Wednesday, 6 August 2014 16:20 >To: Andreas Hansson ><[email protected]<mailto:[email protected]>>, Default ><[email protected]<mailto:[email protected]>> >Subject: RE: Review Request 2320: sim: stopgap for race-conditions when >using multiple EventQueues > >Hi Andreas, > >Great question. Since I saw your question, I ran some small tests using >Queens algorithm, 12x12 board. There is a performance impact for the >single-threaded case. Here is the slowdown that I observed for the >single-threaded case: > >50% slowdown with this entire patch applied, ouch. >12% slowdown with all the locks applied, except for the std::atomic_int > >So the std::atomic_int is the biggest factor, but this should be fixed >for all of the locks anyway. So I should have gem5 use the locks in this >patch only when using multiple threads. I will look into this. Right now >I'm considering using #ifdef for that. I am open to suggestions. > >Thanks! > >From: Andreas Hansson [mailto:[email protected]] On Behalf Of >Andreas Hansson >Sent: Monday, August 04, 2014 4:07 AM >To: Andreas Hansson; Default; Brown, Martin >Subject: Re: Review Request 2320: sim: stopgap for race-conditions when >using multiple EventQueues > >This is an automatically generated e-mail. To reply, visit: >http://reviews.gem5.org/r/2320/ > > > > > >Out of curiosity, is there any performance impact for the single-threaded >case? > > >- Andreas Hansson > > >On August 1st, 2014, 7:04 p.m. UTC, Martin Brown wrote: >Review request for Default. >By Martin Brown. > >Updated Aug. 1, 2014, 7:04 p.m. >Repository: gem5 >Description > >Changeset 10264:c3977836244e > >--------------------------- > >sim: stopgap for race-conditions when using multiple EventQueues > > > >This patch fixes several race conditions that appear in multi- > >threaded mode. Currently the decode cache race condition is > >fixed only for x86, and in a temporary non-optimal fashion. We > >still need to decide on a more optimal solution for the decode > >cache and apply it to all the ISAs. > > > > >Testing > >- Quick regression tests on x86, arm, alpha > >- Made sure that sparc, power, mips can be built with this patch > >- Tested using up to 28 EventQueues (28 threads) > > >Diffs >? src/arch/x86/decoder.cc >(c00b5ba43967e7e48a28b7ddc48c9f4afaf2ab76) >? src/base/refcnt.hh (c00b5ba43967e7e48a28b7ddc48c9f4afaf2ab76) >? src/base/trace.cc (c00b5ba43967e7e48a28b7ddc48c9f4afaf2ab76) >? src/sim/syscall_emul.cc >(c00b5ba43967e7e48a28b7ddc48c9f4afaf2ab76) > >View Diff<http://reviews.gem5.org/r/2320/diff/> > > > >-- IMPORTANT NOTICE: The contents of this email and any attachments are >confidential and may also be privileged. If you are not the intended >recipient, please notify the sender immediately and do not disclose the >contents to any other person, use it for any purpose, or store or copy >the information in any medium. Thank you. > >ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, >Registered in England & Wales, Company No: 2557590 >ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, >Registered in England & Wales, Company No: 2548782 >_______________________________________________ >gem5-dev mailing list >[email protected] >http://m5sim.org/mailman/listinfo/gem5-dev > > -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782 _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
