Hi Dan, Glad to hear things are working, and thanks for the tips! I must admit to not quite following what the solution was though -- are you saying the solution is to replace exit(0)/return with m5_exit()? I thought your original post said the problem was things taking a really long time before main? If so, it would seem like something else must have been the problem/solution?
Coming to your other questions: I don't recall what exactly the root cause of the hipDeviceSynchronize failure is, but I would definitely recommend updating to the current staging branch head first and testing. I am also hoping to push a fix today to the barrier bit synchronization -- most of the hipDeviceSynchronize-type failures I've seen were due to a bug in my barrier bit implementation. I'm not sure if this will be the solution to your problem or not, but I can definitely add you as a reviewer and/or point you to it if needed. Not sure about the m5op, hopefully someone else can chime in on that. Thanks, Matt On Fri, Jun 12, 2020 at 12:12 PM Daniel Gerzhoy <daniel.gerz...@gmail.com> wrote: > I've figured it out. > > To measure the time it took to get to main() I put a *return 0; *at the > beginning of the function so I wouldn't have to babysit it. > > I didn't consider that it would also take some time for the simulator to > exit, which is where the extra few minutes comes from. > Side-note: *m5_exit(0);* instead of a return exits immediately. > > 5 min is a bit more reasonable of a slowdown for the difference between > the two clocks. > > Two incidental things: > > 1. Is there a way to have gem5 spit out (real wall-clock) timestamps while > it's printing stuff? > 2. A while ago I asked about hipDeviceSynchronize(); causing crashes > (panic: Tried to read unmapped address 0xff0000c29f48.). Has this been > fixed since? > > I'm going to update to the head of this branch soon, and eventually to the > main branch. If it hasn't been fixed I've created a workaround by stealing > the completion signal of the kernel based on its launch id, and manually > waiting for it using the HSA interface. > Happy to help out and implement this as a m5op (or something) if that > would be helpful for you guys. > > Best, > > Dan > > On Thu, Jun 11, 2020 at 12:40 PM Matt Sinclair <mattdsincl...@gmail.com> > wrote: > >> I don't see anything amazingly amiss in your output, but the number of >> times the open/etc. fail is interesting -- Kyle do we see the same thing? >> If not, it could be that you should update your apu_se.py to point to the >> "correct" place to search for the libraries first? >> >> Also, based on Kyle's reply, Dan how long does it take you to boot up >> square? Certainly a slower machine might take longer, but it does seem >> even slower than expected. But if we're trying the same application, maybe >> it will be easier to spot differences. >> >> I would also recommend updating to the latest commit on the staging >> branch -- I don't believe it should break anything with those patches. >> >> Yes, looks like you are using the release version of ROCm -- no issues >> there. >> >> Matt >> >> >> >> On Thu, Jun 11, 2020 at 9:38 AM Daniel Gerzhoy <daniel.gerz...@gmail.com> >> wrote: >> >>> I am using the docker, yeah. >>> It's running on our server cluster which is a Xeon Gold 5117 @ (2.0 - >>> 2.8 GHz) which might make up some of the difference, the r5 3600 has a >>> faster clock (3.6-4.2 GHz). >>> >>> I've hesitated to update my branch because in the Dockerfile it >>> specifically checks this branch out and applies a patch, though the patch >>> isn't very extensive. >>> This was from a while back (November maybe?) and I know you guys have >>> been integrating things into the main branch (thanks!) >>> I was thinking I would wait until it's fully merged into the mainline >>> gem5 branch and rebase onto that and try to merge my changes in. >>> >>> Last I checked the GCN3 stuff is in the dev branch not the master right? >>> >>> But if it will help maybe I should update to the head of this branch. >>> Will I need to update the docker as well? >>> >>> As for the debug vs release rocm I think I'm using the release version. >>> This is what the dockerfile built: >>> >>> ARG rocm_ver=1.6.2 >>> RUN wget -qO- repo.radeon.com/rocm/archive/apt_${rocm_ver}.tar.bz2 >>> <http://repo.radeon.com/rocm/archive/apt_$%7Brocm_ver%7D.tar.bz2> \ >>> | tar -xjv \ >>> && cd apt_${rocm_ver}/pool/main/ \ >>> && dpkg -i h/hsakmt-roct-dev/* \ >>> && dpkg -i h/hsa-ext-rocr-dev/* \ >>> && dpkg -i h/hsa-rocr-dev/* \ >>> && dpkg -i r/rocm-utils/* \ >>> && dpkg -i h/hcc/* \ >>> && dpkg -i h/hip_base/* \ >>> && dpkg -i h/hip_hcc/* \ >>> && dpkg -i h/hip_samples/* >>> >>> >>> I ran a benchmark that prints that it entered main and returns >>> immediately, this took 9 minutes. >>> I've attached a debug trace with debug flags = >>> "GPUDriver,SyscallVerbose" >>> There's a lot of weird things going on, "syscall open: failed", "syscall >>> brk: break point changed to [...]", and lots of ignored system calls. >>> >>> head of Stats for reference: >>> ---------- Begin Simulation Statistics ---------- >>> sim_seconds 0.096192 >>> # Number of seconds simulated >>> sim_ticks 96192368500 >>> # Number of ticks simulated >>> final_tick 96192368500 >>> # Number of ticks from beginning of simulation (restored from >>> checkpoints and never reset) >>> sim_freq 1000000000000 >>> # Frequency of simulated ticks >>> host_inst_rate 175209 >>> # Simulator instruction rate (inst/s) >>> host_op_rate 338409 >>> # Simulator op (including micro ops) rate (op/s) >>> host_tick_rate 175362515 >>> # Simulator tick rate (ticks/s) >>> host_mem_usage 1628608 >>> # Number of bytes of host memory used >>> host_seconds 548.53 >>> # Real time elapsed on the host >>> sim_insts 96108256 >>> # Number of instructions simulated >>> sim_ops 185628785 >>> # Number of ops (including micro ops) simulated >>> system.voltage_domain.voltage 1 >>> # Voltage in Volts >>> system.clk_domain.clock 1000 >>> # Clock period in ticks >>> >>> Maybe something in the attached file explains it better than I can >>> express. >>> >>> Many thanks for your help and hard work! >>> >>> Dan >>> >>> >>> >>> >>> >>> On Thu, Jun 11, 2020 at 3:32 AM Kyle Roarty <kroa...@wisc.edu> wrote: >>> >>>> Running through a few applications, it took me about 2.5 minutes or >>>> less each time using docker to start executing the program on an r5 3600. >>>> >>>> I ran square, dynamic_shared, and MatrixTranspose (All from HIP) which >>>> took about 1-1.5 mins. >>>> >>>> I ran conv_bench and rnn_bench from DeepBench which took just about 2 >>>> minutes. >>>> >>>> Because of that, it's possible the size of the app has an effect on >>>> setup time, as the HIP apps are extremely small. >>>> >>>> Also, the commit Dan is checked out on is d0945dc >>>> <https://gem5.googlesource.com/amd/gem5/+/d0945dc285cf146de160808d7e6d4c1fd3f73639> >>>> mem-ruby: >>>> add cache hit/miss statistics for TCP and TCC >>>> <https://gem5.googlesource.com/amd/gem5/+/d0945dc285cf146de160808d7e6d4c1fd3f73639>, >>>> which isn't the most recent commit. I don't believe that that would account >>>> for such a large slowdown, but it doesn't hurt to try the newest commit >>>> unless it breaks something. >>>> >>>> Kyle >>>> ------------------------------ >>>> *From:* mattdsincl...@gmail.com <mattdsincl...@gmail.com> >>>> *Sent:* Thursday, June 11, 2020 1:15 AM >>>> *To:* gem5 users mailing list <gem5-users@gem5.org> >>>> *Cc:* Daniel Gerzhoy <daniel.gerz...@gmail.com>; GAURAV JAIN < >>>> gja...@wisc.edu>; Kyle Roarty <kroa...@wisc.edu> >>>> *Subject:* Re: [gem5-users] GCN3 GPU Simulation Start-Up Time >>>> >>>> Gaurav & Kyle, do you know if this is the case? >>>> >>>> Dan, I believe the short answer is yes although 7-8 minutes seems a >>>> little long. Are you running this in Kyle's Docker, or separately? If in >>>> the Docker, that does increase the overhead somewhat, so running it >>>> directly on a system would likely reduce the overhead somewhat. Also, are >>>> you running with the release or debug version of the ROCm drivers? Again, >>>> debug version will likely add some time to this. >>>> >>>> Matt >>>> >>>> On Wed, Jun 10, 2020 at 2:00 PM Daniel Gerzhoy via gem5-users < >>>> gem5-users@gem5.org> wrote: >>>> >>>> I've been running simulations using the GCN3 branch: >>>> >>>> rocm_ver=1.6.2 >>>> $git branch >>>> * (HEAD detached at d0945dc) >>>> agutierr/master-gcn3-staging >>>> >>>> And I've noticed that it takes roughly 7-8 minutes to get to main() >>>> >>>> I'm guessing that this is the simulator setting up drivers? >>>> Is that correct? Is there other stuff going on? >>>> >>>> *Has anyone found a way to speed this up? * >>>> >>>> I am trying to get some of the rodinia benchmarks from the HIP-Examples >>>> running and debugging takes a long time as a result. >>>> >>>> I suspect that this is unavoidable but I won't know if I don't ask! >>>> >>>> Cheers, >>>> >>>> Dan Gerzhoy >>>> _______________________________________________ >>>> gem5-users mailing list -- gem5-users@gem5.org >>>> To unsubscribe send an email to gem5-users-le...@gem5.org >>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s >>>> >>>>
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s