Matt,

It wasn't so much a solution as an explanation. Kyle was running on an r5
3600 (3.6-4.2 GHz) whereas I am on a Xeon Gold 5117 @ (2.0 - 2.8 GHz)

The relative difference in clock speed seems to me to be a more reasonable
explanation for a slowdown from 1-1.5 minutes to ~5min (actual time before
min) than the 8 min (time before main + exit time) I was seeing before.

I'll update to the latest branch and see if that speeds me up further. I'm
also going to try running on a faster machine as well though that will take
some setup-time.

Gaurav,

Thanks for the tip, that will be helpful in the meantime.

Dan

On Fri, Jun 12, 2020 at 3:41 PM GAURAV JAIN <gja...@wisc.edu> wrote:

> Hi,
>
> I am not sure if chiming in now would cause any more confusion, but still
> giving it a try.
>
> @Daniel Gerzhoy <daniel.gerz...@gmail.com> - for hipDeviceSynchronize, as
> Matt mentioned, they are working on a fix and should have it out there. If
> you want to, can you try this:
>
>     hipSetDeviceFlags(hipDeviceScheduleSpin);
>     for (int k = 1; k < dim; k++) {
>         hipLaunchKernelGGL(HIP_KERNEL_NAME(somekernel), grid, threads, 0,
> 0);
>         hipDeviceSynchronize();
>     }
>
> For me, in many cases (not all and in the ones which it didn't work, I got
> the same error unmapped error as you), this seemed like doing the trick.
> You should checkout the HEAD and then try this. I am not hoping for it to
> make any difference but still worth a shot.
>
>
> ------------------------------
> *From:* mattdsincl...@gmail.com <mattdsincl...@gmail.com>
> *Sent:* Friday, June 12, 2020 2:14 PM
> *To:* Daniel Gerzhoy <daniel.gerz...@gmail.com>
> *Cc:* Kyle Roarty <kroa...@wisc.edu>; GAURAV JAIN <gja...@wisc.edu>; gem5
> users mailing list <gem5-users@gem5.org>
> *Subject:* Re: [gem5-users] GCN3 GPU Simulation Start-Up Time
>
> Hi Dan,
>
> Glad to hear things are working, and thanks for the tips!  I must admit to
> not quite following what the solution was though -- are you saying the
> solution is to replace exit(0)/return with m5_exit()?  I thought your
> original post said the problem was things taking a really long time before
> main?  If so, it would seem like something else must have been the
> problem/solution?
>
> Coming to your other questions: I don't recall what exactly the root cause
> of the hipDeviceSynchronize failure is, but I would definitely recommend
> updating to the current staging branch head first and testing.  I am also
> hoping to push a fix today to the barrier bit synchronization -- most of
> the hipDeviceSynchronize-type failures I've seen were due to a bug in my
> barrier bit implementation.  I'm not sure if this will be the solution to
> your problem or not, but I can definitely add you as a reviewer and/or
> point you to it if needed.
>
> Not sure about the m5op, hopefully someone else can chime in on that.
>
> Thanks,
> Matt
>
> On Fri, Jun 12, 2020 at 12:12 PM Daniel Gerzhoy <daniel.gerz...@gmail.com>
> wrote:
>
> I've figured it out.
>
> To measure the time it took to get to main() I put a *return 0; *at the
> beginning of the function so I wouldn't have to babysit it.
>
> I didn't consider that it would also take some time for the simulator to
> exit, which is where the extra few minutes comes from.
> Side-note: *m5_exit(0);* instead of a return exits immediately.
>
> 5 min is a bit more reasonable of a slowdown for the difference between
> the two clocks.
>
> Two incidental things:
>
> 1. Is there a way to have gem5 spit out (real wall-clock) timestamps while
> it's printing stuff?
> 2. A while ago I asked about hipDeviceSynchronize(); causing crashes
> (panic: Tried to read unmapped address 0xff0000c29f48.). Has this been
> fixed since?
>
> I'm going to update to the head of this branch soon, and eventually to the
> main branch. If it hasn't been fixed I've created a workaround by stealing
> the completion signal of the kernel based on its launch id, and manually
> waiting for it using the HSA interface.
> Happy to help out and implement this as a m5op (or something) if that
> would be helpful for you guys.
>
> Best,
>
> Dan
>
> On Thu, Jun 11, 2020 at 12:40 PM Matt Sinclair <mattdsincl...@gmail.com>
> wrote:
>
> I don't see anything amazingly amiss in your output, but the number of
> times the open/etc. fail is interesting -- Kyle do we see the same thing?
> If not, it could be that you should update your apu_se.py to point to the
> "correct" place to search for the libraries first?
>
> Also, based on Kyle's reply, Dan how long does it take you to boot up
> square?  Certainly a slower machine might take longer, but it does seem
> even slower than expected.  But if we're trying the same application, maybe
> it will be easier to spot differences.
>
> I would also recommend updating to the latest commit on the staging branch
> -- I don't believe it should break anything with those patches.
>
> Yes, looks like you are using the release version of ROCm -- no issues
> there.
>
> Matt
>
>
>
> On Thu, Jun 11, 2020 at 9:38 AM Daniel Gerzhoy <daniel.gerz...@gmail.com>
> wrote:
>
> I am using the docker, yeah.
> It's running on our server cluster which is a Xeon Gold 5117 @ (2.0 - 2.8
> GHz) which might make up some of the difference, the r5 3600 has a faster
> clock (3.6-4.2 GHz).
>
> I've hesitated to update my branch because in the Dockerfile it
> specifically checks this branch out and applies a patch, though the patch
> isn't very extensive.
> This was from a while back (November maybe?) and I know you guys have been
> integrating things into the main branch (thanks!)
> I was thinking I would wait until it's fully merged into the mainline gem5
> branch and rebase onto that and try to merge my changes in.
>
> Last I checked the GCN3 stuff is in the dev branch not the master right?
>
> But if it will help maybe I should update to the head of this branch. Will
> I need to update the docker as well?
>
> As for the debug vs release rocm I think I'm using the release version.
> This is what the dockerfile built:
>
> ARG rocm_ver=1.6.2
> RUN wget -qO- repo.radeon.com/rocm/archive/apt_${rocm_ver}.tar.bz2
> <http://repo.radeon.com/rocm/archive/apt_$%7Brocm_ver%7D.tar.bz2> \
>     | tar -xjv \
>     && cd apt_${rocm_ver}/pool/main/ \
>     && dpkg -i h/hsakmt-roct-dev/* \
>     && dpkg -i h/hsa-ext-rocr-dev/* \
>     && dpkg -i h/hsa-rocr-dev/* \
>     && dpkg -i r/rocm-utils/* \
>     && dpkg -i h/hcc/* \
>     && dpkg -i h/hip_base/* \
>     && dpkg -i h/hip_hcc/* \
>     && dpkg -i h/hip_samples/*
>
>
> I ran a benchmark that prints that it entered main and returns
> immediately, this took 9 minutes.
> I've attached a debug trace with debug flags = "GPUDriver,SyscallVerbose"
> There's a lot of weird things going on, "syscall open: failed", "syscall
> brk: break point changed to [...]", and lots of ignored system calls.
>
> head of Stats for reference:
> ---------- Begin Simulation Statistics ----------
> sim_seconds                                  0.096192
>   # Number of seconds simulated
> sim_ticks                                 96192368500
>   # Number of ticks simulated
> final_tick                                96192368500
>   # Number of ticks from beginning of simulation (restored from checkpoints
> and never reset)
> sim_freq                                 1000000000000
>   # Frequency of simulated ticks
> host_inst_rate                                 175209
>   # Simulator instruction rate (inst/s)
> host_op_rate                                   338409
>   # Simulator op (including micro ops) rate (op/s)
> host_tick_rate                              175362515
>   # Simulator tick rate (ticks/s)
> host_mem_usage                                1628608
>   # Number of bytes of host memory used
> host_seconds                                   548.53
>   # Real time elapsed on the host
> sim_insts                                    96108256
>   # Number of instructions simulated
> sim_ops                                     185628785
>   # Number of ops (including micro ops) simulated
> system.voltage_domain.voltage                       1
>   # Voltage in Volts
> system.clk_domain.clock                          1000
>   # Clock period in ticks
>
> Maybe something in the attached file explains it better than I can express.
>
> Many thanks for your help and hard work!
>
> Dan
>
>
>
>
>
> On Thu, Jun 11, 2020 at 3:32 AM Kyle Roarty <kroa...@wisc.edu> wrote:
>
> Running through a few applications, it took me about 2.5 minutes or less
> each time using docker to start executing the program on an r5 3600.
>
> I ran square, dynamic_shared, and MatrixTranspose (All from HIP) which
> took about 1-1.5 mins.
>
> I ran conv_bench and rnn_bench from DeepBench which took just about 2
> minutes.
>
> Because of that, it's possible the size of the app has an effect on setup
> time, as the HIP apps are extremely small.
>
> Also, the commit Dan is checked out on is d0945dc
> <https://gem5.googlesource.com/amd/gem5/+/d0945dc285cf146de160808d7e6d4c1fd3f73639>
>  mem-ruby:
> add cache hit/miss statistics for TCP and TCC
> <https://gem5.googlesource.com/amd/gem5/+/d0945dc285cf146de160808d7e6d4c1fd3f73639>,
> which isn't the most recent commit. I don't believe that that would account
> for such a large slowdown, but it doesn't hurt to try the newest commit
> unless it breaks something.
>
> Kyle
> ------------------------------
> *From:* mattdsincl...@gmail.com <mattdsincl...@gmail.com>
> *Sent:* Thursday, June 11, 2020 1:15 AM
> *To:* gem5 users mailing list <gem5-users@gem5.org>
> *Cc:* Daniel Gerzhoy <daniel.gerz...@gmail.com>; GAURAV JAIN <
> gja...@wisc.edu>; Kyle Roarty <kroa...@wisc.edu>
> *Subject:* Re: [gem5-users] GCN3 GPU Simulation Start-Up Time
>
> Gaurav & Kyle, do you know if this is the case?
>
> Dan, I believe the short answer is yes although 7-8 minutes seems a little
> long.  Are you running this in Kyle's Docker, or separately?  If in the
> Docker, that does increase the overhead somewhat, so running it directly on
> a system would likely reduce the overhead somewhat.  Also, are you running
> with the release or debug version of the ROCm drivers?  Again, debug
> version will likely add some time to this.
>
> Matt
>
> On Wed, Jun 10, 2020 at 2:00 PM Daniel Gerzhoy via gem5-users <
> gem5-users@gem5.org> wrote:
>
> I've been running simulations using the GCN3 branch:
>
> rocm_ver=1.6.2
> $git branch
>    * (HEAD detached at d0945dc)
>       agutierr/master-gcn3-staging
>
> And I've noticed that it takes roughly 7-8 minutes to get to main()
>
> I'm guessing that this is the simulator setting up drivers?
> Is that correct? Is there other stuff going on?
>
> *Has anyone found a way to speed this up? *
>
> I am trying to get some of the rodinia benchmarks from the HIP-Examples
> running and debugging takes a long time as a result.
>
> I suspect that this is unavoidable but I won't know if I don't ask!
>
> Cheers,
>
> Dan Gerzhoy
> _______________________________________________
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
>
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to