Matt, It wasn't so much a solution as an explanation. Kyle was running on an r5 3600 (3.6-4.2 GHz) whereas I am on a Xeon Gold 5117 @ (2.0 - 2.8 GHz)
The relative difference in clock speed seems to me to be a more reasonable explanation for a slowdown from 1-1.5 minutes to ~5min (actual time before min) than the 8 min (time before main + exit time) I was seeing before. I'll update to the latest branch and see if that speeds me up further. I'm also going to try running on a faster machine as well though that will take some setup-time. Gaurav, Thanks for the tip, that will be helpful in the meantime. Dan On Fri, Jun 12, 2020 at 3:41 PM GAURAV JAIN <gja...@wisc.edu> wrote: > Hi, > > I am not sure if chiming in now would cause any more confusion, but still > giving it a try. > > @Daniel Gerzhoy <daniel.gerz...@gmail.com> - for hipDeviceSynchronize, as > Matt mentioned, they are working on a fix and should have it out there. If > you want to, can you try this: > > hipSetDeviceFlags(hipDeviceScheduleSpin); > for (int k = 1; k < dim; k++) { > hipLaunchKernelGGL(HIP_KERNEL_NAME(somekernel), grid, threads, 0, > 0); > hipDeviceSynchronize(); > } > > For me, in many cases (not all and in the ones which it didn't work, I got > the same error unmapped error as you), this seemed like doing the trick. > You should checkout the HEAD and then try this. I am not hoping for it to > make any difference but still worth a shot. > > > ------------------------------ > *From:* mattdsincl...@gmail.com <mattdsincl...@gmail.com> > *Sent:* Friday, June 12, 2020 2:14 PM > *To:* Daniel Gerzhoy <daniel.gerz...@gmail.com> > *Cc:* Kyle Roarty <kroa...@wisc.edu>; GAURAV JAIN <gja...@wisc.edu>; gem5 > users mailing list <gem5-users@gem5.org> > *Subject:* Re: [gem5-users] GCN3 GPU Simulation Start-Up Time > > Hi Dan, > > Glad to hear things are working, and thanks for the tips! I must admit to > not quite following what the solution was though -- are you saying the > solution is to replace exit(0)/return with m5_exit()? I thought your > original post said the problem was things taking a really long time before > main? If so, it would seem like something else must have been the > problem/solution? > > Coming to your other questions: I don't recall what exactly the root cause > of the hipDeviceSynchronize failure is, but I would definitely recommend > updating to the current staging branch head first and testing. I am also > hoping to push a fix today to the barrier bit synchronization -- most of > the hipDeviceSynchronize-type failures I've seen were due to a bug in my > barrier bit implementation. I'm not sure if this will be the solution to > your problem or not, but I can definitely add you as a reviewer and/or > point you to it if needed. > > Not sure about the m5op, hopefully someone else can chime in on that. > > Thanks, > Matt > > On Fri, Jun 12, 2020 at 12:12 PM Daniel Gerzhoy <daniel.gerz...@gmail.com> > wrote: > > I've figured it out. > > To measure the time it took to get to main() I put a *return 0; *at the > beginning of the function so I wouldn't have to babysit it. > > I didn't consider that it would also take some time for the simulator to > exit, which is where the extra few minutes comes from. > Side-note: *m5_exit(0);* instead of a return exits immediately. > > 5 min is a bit more reasonable of a slowdown for the difference between > the two clocks. > > Two incidental things: > > 1. Is there a way to have gem5 spit out (real wall-clock) timestamps while > it's printing stuff? > 2. A while ago I asked about hipDeviceSynchronize(); causing crashes > (panic: Tried to read unmapped address 0xff0000c29f48.). Has this been > fixed since? > > I'm going to update to the head of this branch soon, and eventually to the > main branch. If it hasn't been fixed I've created a workaround by stealing > the completion signal of the kernel based on its launch id, and manually > waiting for it using the HSA interface. > Happy to help out and implement this as a m5op (or something) if that > would be helpful for you guys. > > Best, > > Dan > > On Thu, Jun 11, 2020 at 12:40 PM Matt Sinclair <mattdsincl...@gmail.com> > wrote: > > I don't see anything amazingly amiss in your output, but the number of > times the open/etc. fail is interesting -- Kyle do we see the same thing? > If not, it could be that you should update your apu_se.py to point to the > "correct" place to search for the libraries first? > > Also, based on Kyle's reply, Dan how long does it take you to boot up > square? Certainly a slower machine might take longer, but it does seem > even slower than expected. But if we're trying the same application, maybe > it will be easier to spot differences. > > I would also recommend updating to the latest commit on the staging branch > -- I don't believe it should break anything with those patches. > > Yes, looks like you are using the release version of ROCm -- no issues > there. > > Matt > > > > On Thu, Jun 11, 2020 at 9:38 AM Daniel Gerzhoy <daniel.gerz...@gmail.com> > wrote: > > I am using the docker, yeah. > It's running on our server cluster which is a Xeon Gold 5117 @ (2.0 - 2.8 > GHz) which might make up some of the difference, the r5 3600 has a faster > clock (3.6-4.2 GHz). > > I've hesitated to update my branch because in the Dockerfile it > specifically checks this branch out and applies a patch, though the patch > isn't very extensive. > This was from a while back (November maybe?) and I know you guys have been > integrating things into the main branch (thanks!) > I was thinking I would wait until it's fully merged into the mainline gem5 > branch and rebase onto that and try to merge my changes in. > > Last I checked the GCN3 stuff is in the dev branch not the master right? > > But if it will help maybe I should update to the head of this branch. Will > I need to update the docker as well? > > As for the debug vs release rocm I think I'm using the release version. > This is what the dockerfile built: > > ARG rocm_ver=1.6.2 > RUN wget -qO- repo.radeon.com/rocm/archive/apt_${rocm_ver}.tar.bz2 > <http://repo.radeon.com/rocm/archive/apt_$%7Brocm_ver%7D.tar.bz2> \ > | tar -xjv \ > && cd apt_${rocm_ver}/pool/main/ \ > && dpkg -i h/hsakmt-roct-dev/* \ > && dpkg -i h/hsa-ext-rocr-dev/* \ > && dpkg -i h/hsa-rocr-dev/* \ > && dpkg -i r/rocm-utils/* \ > && dpkg -i h/hcc/* \ > && dpkg -i h/hip_base/* \ > && dpkg -i h/hip_hcc/* \ > && dpkg -i h/hip_samples/* > > > I ran a benchmark that prints that it entered main and returns > immediately, this took 9 minutes. > I've attached a debug trace with debug flags = "GPUDriver,SyscallVerbose" > There's a lot of weird things going on, "syscall open: failed", "syscall > brk: break point changed to [...]", and lots of ignored system calls. > > head of Stats for reference: > ---------- Begin Simulation Statistics ---------- > sim_seconds 0.096192 > # Number of seconds simulated > sim_ticks 96192368500 > # Number of ticks simulated > final_tick 96192368500 > # Number of ticks from beginning of simulation (restored from checkpoints > and never reset) > sim_freq 1000000000000 > # Frequency of simulated ticks > host_inst_rate 175209 > # Simulator instruction rate (inst/s) > host_op_rate 338409 > # Simulator op (including micro ops) rate (op/s) > host_tick_rate 175362515 > # Simulator tick rate (ticks/s) > host_mem_usage 1628608 > # Number of bytes of host memory used > host_seconds 548.53 > # Real time elapsed on the host > sim_insts 96108256 > # Number of instructions simulated > sim_ops 185628785 > # Number of ops (including micro ops) simulated > system.voltage_domain.voltage 1 > # Voltage in Volts > system.clk_domain.clock 1000 > # Clock period in ticks > > Maybe something in the attached file explains it better than I can express. > > Many thanks for your help and hard work! > > Dan > > > > > > On Thu, Jun 11, 2020 at 3:32 AM Kyle Roarty <kroa...@wisc.edu> wrote: > > Running through a few applications, it took me about 2.5 minutes or less > each time using docker to start executing the program on an r5 3600. > > I ran square, dynamic_shared, and MatrixTranspose (All from HIP) which > took about 1-1.5 mins. > > I ran conv_bench and rnn_bench from DeepBench which took just about 2 > minutes. > > Because of that, it's possible the size of the app has an effect on setup > time, as the HIP apps are extremely small. > > Also, the commit Dan is checked out on is d0945dc > <https://gem5.googlesource.com/amd/gem5/+/d0945dc285cf146de160808d7e6d4c1fd3f73639> > mem-ruby: > add cache hit/miss statistics for TCP and TCC > <https://gem5.googlesource.com/amd/gem5/+/d0945dc285cf146de160808d7e6d4c1fd3f73639>, > which isn't the most recent commit. I don't believe that that would account > for such a large slowdown, but it doesn't hurt to try the newest commit > unless it breaks something. > > Kyle > ------------------------------ > *From:* mattdsincl...@gmail.com <mattdsincl...@gmail.com> > *Sent:* Thursday, June 11, 2020 1:15 AM > *To:* gem5 users mailing list <gem5-users@gem5.org> > *Cc:* Daniel Gerzhoy <daniel.gerz...@gmail.com>; GAURAV JAIN < > gja...@wisc.edu>; Kyle Roarty <kroa...@wisc.edu> > *Subject:* Re: [gem5-users] GCN3 GPU Simulation Start-Up Time > > Gaurav & Kyle, do you know if this is the case? > > Dan, I believe the short answer is yes although 7-8 minutes seems a little > long. Are you running this in Kyle's Docker, or separately? If in the > Docker, that does increase the overhead somewhat, so running it directly on > a system would likely reduce the overhead somewhat. Also, are you running > with the release or debug version of the ROCm drivers? Again, debug > version will likely add some time to this. > > Matt > > On Wed, Jun 10, 2020 at 2:00 PM Daniel Gerzhoy via gem5-users < > gem5-users@gem5.org> wrote: > > I've been running simulations using the GCN3 branch: > > rocm_ver=1.6.2 > $git branch > * (HEAD detached at d0945dc) > agutierr/master-gcn3-staging > > And I've noticed that it takes roughly 7-8 minutes to get to main() > > I'm guessing that this is the simulator setting up drivers? > Is that correct? Is there other stuff going on? > > *Has anyone found a way to speed this up? * > > I am trying to get some of the rodinia benchmarks from the HIP-Examples > running and debugging takes a long time as a result. > > I suspect that this is unavoidable but I won't know if I don't ask! > > Cheers, > > Dan Gerzhoy > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-le...@gem5.org > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s